Application No.: 09/884,455 



4 



Docket No.: 223002010004 



REMARKS 

Claims 27-36 are pending in the application. 

Reconsideration of the application is respectfully requested in. view of the above 
amendments and the following remarks. For the Examiner's convenience, Applicants' remarks are 
presented in the order in which they were raised in the Office Action. Pertinent opinions of experts 
in the forms of a Second Declaration under 37 C.F.R. 1.132 from Dr. Amy Weiner, and a 
Declaration under 37 C.F.R. 1.132 from Dr. J.-H. James Ou, accompany this response. 

A. Substance of Interview with the Examiner 

Applicants thank Examiner Prouty for the interview granted on February 3, 2006 with 
SPE Kathleen Kerr, Examiner William Moore, Applicants' representative Gladys Monroy, and 
experts Dr. Amy Weiner and Dr. James Ou. An interview summary was mailed by the Examiner on 
March 1,2006. 

All pending claims were discussed in view of the art disclosed by Eckart et al., Pallaoro, 
et al. and Thiebault et al. Two primary issues were discussed: (1) With respect to whether the 
specification provides written description and enablement (how to use) the serine protease activity 
of SEQ ID NO: 65, Applicants argued that the Specification describes viral fragments and viral 
polyprotein as substrates and the art teaches that several substrates can be cleaved without the 
cofactor 4A. Agreement was not reached as the Examiner required that a suitable substrate for the 
serine protease according to one of skill in the art be addressed in the response. The Examiner's 
concerns are addressed in the following sections of this response. 

(2) With respect to the issue of whether the specification provides sufficient description 
and enablement of an active NS2/3 protease as shown by Example 5 with reference to the SOD 
fusions p600, p500 and p300, Applicants argued that the protease activity shown in Example 5 of 
the specification corresponds to NS2/3 protease and addition of any additional sequence to the 
amino terminus at residue 946 (herein provided by hSOD) was sufficient to produce an active 
protease. Agreement was not reached as the Examiner required evidence of the stabilization of the 
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fusion constructs and explanation of the discrepancy in observed molecular weight be provided in 
the response. The Examiner's concerns are addressed in the following sections. 

B. Non-Statutory Double Patenting 

(a) Claims 27-30 remain rejected under the judicially created doctrine of obviousness- 
type double patenting as being unpatentable over claims 1-4 of U.S. Patent No. 5,585,258. 

Claims 31-35 remain rejected under the judicially created doctrine of obviousness-type 
double patenting as being unpatentable over claims 5-9 of U.S. Patent No. 5,585,258 in view of 
Benson et al., U.S. Patent No. 5,258,496. Benson is cited for the teaching of recombinant fusion 
polypeptides being comprised in compositions during purification from the host cell. 

Claim 36 remains rejected under the judicially created doctrine of obviousness-type 
double patenting as being unpatentable over claims 1 and 3-5 of U.S. Patent No. 5,597,691. 

Claims 27 and 30 remain rejected under the judicially created doctrine of obviousness- 
type double patenting as being unpatentable over claims 1 and 2 of U.S. Patent No. 5,712,145. 

Claims 31, 32 and 35 remain rejected under the judicially created doctrine of 
obviousness-type double patenting as being unpatentable over claims 3-5 of U.S. Patent No. 
5,712,145 in view of Benson et al., U.S. Patent No. 5,258,496. 

Claim 36 remains rejected under the judicially created doctrine of obviousness-type 
double patenting as being unpatentable over claims 7 and 8 of U.S. Patent No. 5,712,145. 

Applicants submit that they will file a terminal disclaimer in the present application to 
disclaim any term beyond the term of the earlier expiring patents in order to overcome this ground 
for rejection, after the conflicting claims are found to be allowable. 

(b) Claims 27 and 30 remain provisionally rejected tinder the judicially created doctrine 
of obviousness-type double patenting as being unpatentable over claim 1 1 of copending Application 
No. 10/409,094, which is an application for reissue of U.S. Patent No. 5,585,258. 
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Applicants submit that they will file a terminal disclaimer in the present application to 
disclaim any term beyond the term of the earlier expiring patents in order to overcome this ground 
for rejection, after the conflicting claims are found to be allowable. 

Claim 36 remains provisionally rejected under the judicially created doctrine of 
obviousness-type double patenting as being unpatentable over claim 6 of copending Application No. 
10/409,673, which is an application for reissue of U.S. Patent No. 5,597,691 . 

Applicants submit that they will file a terminal disclaimer in the present application to 
disclaim any term beyond the term of the earlier expiring patents in order to overcome this ground 
for rejection, after the conflicting claims are found to be allowable. 

C. Claim Rejections under 35 U.S.C. § 112, First Paragraph 

(i) Rejection of claims 27-36 for lack of Written Description under 35 U.S.C. § 112, 
first paragraph 

Claims 27-36 stand rejected under 35 U.S.C. § 1 12, first paragraph, as containing subject 
matter which was not described in the specification in such a way as to reasonably convey to one 
skilled in the relevant art that the inventors, at the time the application was filed, has possession of 
the claimed invention. 

In particular, the Office Action states that the fusion proteins disclosed in Example 5 of 
Applicants' specification are insufficient for HCV-specific proteolysis even though they comprise 
specific amino acids His-952 and Cys-993 needed for NS2/NS3 protease cleavage. The Office 
Action raises the following arguments in support of its conclusion. Applicants' response to each of 
these arguments is also set forth below. 

Applicants respectfully traverse. "[TJhe 'essential goal 5 of the description of the 
invention requirement is to clearly convey the information that an Applicant has invented the 
subject matter which is claimed." In re Barker, 559 F.2d 588, 592 n.4, 194 USPQ 470, 473 n.4 
(CCPA 1977). The test for sufficiency of support in a parent application is whether the disclosure of 
the application relied upon "reasonably conveys to the artisan that the inventor had possession at 
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that time of the later claimed subject matter." Ralston Purina Co. v. Far-Mar-Co., Inc., 772 F.2d 
1570, 1575, 227 USPQ 177, 179 (Fed. Cir. 1985) (quoting In re Kaslow, 707 F.2d 1366, 1375, 217 
USPQ 1089, 1096 (Fed. Cir. 1983)). 

Applicants submit that the Specification of the '455 application discloses: 

(i) an NS3 domain hepatitis C virus protease that corresponds to the HCV NS3 serine 
protease activity, the protease comprising the entire domain required for NS3 serine protease, and 
discloses substrates for assaying its activity; and 

(ii) an NS3 domain hepatitis C virus protease that corresponds to the HCV NS2/3 
protease activity, the protease comprising the active site residues of the NS2/3 protease and 
displaying an autocatalytic activity of the protein fused to a hSOD protein as disclosed in Example 5 
of the Specification. 

Either activity is sufficient for practicing a "method for assaying compounds for activity 
against hepatitis C virus." 

While the Examiner appears to contend that because Applicants do not maintain that 
Example 5 of the Specification relates to a HCV NS3 serine protease activity, Applicants are 
required to rely solely on the NS2/3 protease activity in support of the claimed M NS3 domain HCV 
protease." Applicants respectfully disagree. Applicants contend that the "NS3 domain HCV 
protease" comprises both the NS2/3 protease activity disclosed in Example 5, and the NS3 serine 
protease activity disclosed inter alia in Examples 10 and 11. 

(a) The Specification discloses an HCV NS3 domain that has a NS3 serine 
protease activity. 

The Examiner concedes that the specification identifies and recognizes a serine protease 
activity. Office Action at page 6. The Office Action notes that the specification defines an HCV 
NS3 domain protease as having termini established "by expression and processing in an 
appropriate host of a DNA construct encoding the entire NS3 domain." (emphasis original). 
Office Action at page 6. 
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Applicants submit that the '455 application not only discloses the structure of the NS3 
serine protease, it also teaches a method for making it by in vitro expression. (Declaration of Dr. 
Ou, Tffl 9-11; Second Declaration of Dr. Weiner, ^ 10-12; page 5, line 1 1 to page 6, line 4; page 7, 
line 19 through page 8, line 6; pages 8-9, 37-39, Tables 1 and 2). 

Further, the Examiner contends that the publication of Eckart et al. does not support a 
constructive reduction to practice of a NS3 domain protease that either "comprises" or "consists of 
SEQ ID NO: 65 in the prophetic Examples 10 and 1 1 of the specification. According to the Office 
Action, the various HCV-derived proteins expressed in Eckart et al., whether or not the Ser^Gly 
mutation is present, share the entire NS3 domain and NS4A region, which are sufficient for cleaving 
at the NS3-NS4 boundary. Office Action, at 12-13. The Examiner suggests that the NS3 protease 
may not function in the absence of NS4A cofactor, which is absent from "the amino acid sequence 
of SEQ ID NO:65 or active NS3 domain hepatitis C virus protease truncation analog thereof." 
Applicants respectfully traverse. 

While NS4A has been shown to act as a co-factor with NS3 serine protease, NS3 serine 
protease cleaves several substrates in the absence of the cofactor NS4A as discussed below. 
(Declaration of Dr. Ou, Iflf 13-18; Second Declaration of Dr. Weiner, 14-19.) 

While it is now known that the NS3 serine protease cleaves the HCV polyprotein at 
multiple sites - NS3/4A, NS4A/4B, NS4B/5 A and NS5A/5B, only the NS4B/5 A cleavage is 
dependent on the presence of NS4A. Bartenschlager et al. (J Virol. 68(8):5045-5055 (1994)). 
Bartenschlager has also shown that the first 211 amino acids of NS3 were sufficient for processing 
at all trans sites. 

The NS3 serine protease-mediated cleavages at NS3/4A, NS4A/4B and NS5 A/5B are 
processed efficiently in trans by the NS3 serine protease without NS4A. 

By using an NS3-5B substrate with an inactivated serine proteinase domain, trans- 
cleavage was observed at all sites except for the 3/4A site. Deletion of the inactive 
proteinase domain led to efficient trans-processing at the 3/4A site. Smaller NS4A- 
4B and NS5A-5B substrates were processed efficiently in trans; however, cleavage 
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of an NS4B-5A substrate occurred only when the serine proteinase domain was 
coexpressed with NS4A. 

Abstract, Lin et al., J. Virol. 68(12): 8147-8157 (1994). 

Others have reported the same observation that NS4A is not a necessary co-factor for 
NS3 serine protease activity: 

• Sardana et al. (Protein Expression and Purification 16:440-447 (1999)) showed 
that the NS4A cofactor is essential for "high" proteolytic activity of the NS3 
serine protease, (see Abstract). However, Sardana also found that proteolysis at 
the NS4A/4B junction is carried out at detectable levels by the NS3 serine 
protease in the absence of NS4A. (Sardana at 443, left col.). 

• Vishnuvardhan et al. (FEBS Lett. 402(2-3):209-212 (1997)) showed that a NS3 
serine protease representing amino acids 1027-1218 of the HCV polyprotein, and 
not including any NS4A region, cleaves the NS 5 A/5 B junction in the absence of 
NS4A. (Figs. 1 and 3). NS4A (amino acids 1658-1712; see Fig. 1) enhances the 
cleavage but is not essential for it. (Fig. 3). Further, Vishnuvardhan classifies the 
NS4A/4B cleavage site as "NS4A-independent" cleavage site, (at 211). 

• Barbato et al. J. Mol. Biol. (1999) 289, 371-384, at 382, left col. states that 
"[interaction with the NS4A cofactor is required to perform the cleavages at 
NS3/NS4A, NS4A/NS4B and NS4B/NS5 A junctions but the proteinase in its 
uncomplexed state is still able to cleave at the NS5 A/NS5B boundaries, although 
with a much lower activity." 

In their expert declarations, Drs. Ou and Weiner note that the functionally minimal 
domain required for activity of the NS3 serine protease is composed of 146 amino acids, residues 
1059 to 1204 of the HCV polyprotein. (Yamada et al. Virology 246: 104-1 12 (1998)). Figure 1 and 
SEQ ID NO: 1 (page 6, line 26 to page 7, line 18) of the '455 application discloses a sequence that 
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encompasses the entire minimal domain of the NS3 serine protease. (Declaration of Dr. Ou, U 18- 
20; Second Declaration of Dr. Weiner, fflf 19-21.) 

It has also been shown that the 146 amino acids long NS3 minimum domain can function 
by itself as aNS3 serine protease from a structural point of view. Love et al. Cell 87: 331-342 
(1996). 

Applicants respectfully submit that the specification discloses an "amino acid sequence 
of SEQ ID NO:65" and "NS3 domain hepatitis C virus protease or active NS3 domain hepatitis C 
virus protease truncation analog thereof according to claims 1 and 6 that has HCV NS3 serine 
protease activity. 

As discussed in detail below, under the enablement section, the specification also 
discloses substrates for "NS3 domain hepatitis C virus protease" such as full length viral polyprotein 
with the active residues disabled. 

Since the specification discloses both a "NS3 domain hepatitis C virus protease" and a 
substrate therefor, Applicant submit that the specification indicates that the inventors were in 
possession of "a purified proteolytic hepatitis C virus (HCV) polypeptide wherein said HCV 
polypeptide comprises an HCV NS3 domain protease or an active HCV NS3 domain protease 
truncation analog" and respectfully request withdrawal of this ground for rejection. 

(b) The specification discloses an NS2/NS3 protease activity to those of ordinary 
skill in the art 

The Examiner contends that the specification does not convey the existence of an 
NS2/NS3 metalloprotease to those of ordinary skill in the art. In particular, the Examiner states that 
the fusion proteins disclosed in Example 5 of specification are insufficient for HCV-specific 
proteolysis even though they comprise specific amino acids His-952 and Cys-993 needed for 
NS2/NS3 metalloprotease cleavage. Office Action at page 4. 

It is uncontro verted that the fusion proteins on Example 1 contain 1-151 amino acids of 
human SOD protein and amino acids 946-1630 of the HCV polyprotein corresponding to the HCV 
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NS3 domain protease sequence of Figure 1; the sequence corresponding to HCV includes C- 
terminal residues of NS2 and a majority of the NS3 sequence but not including the NS3/4 boundary; 
and the HCV sequence within this construct includes the putative NS2/3 cleavage site between Leu- 
1026 and Ala- 1027 as well as the catalytic residues His-952 and Cys-993. Office Action at pages 4- 
5. 

The Examiner does not agree that the specification inherently discloses a specific 
proteolytic activity that is native to the NS2/NS3 metalloprotease, and thus "the issue of whether the 
specification provides an adequate written disclosure of a claimed assay rests on whether the P600, 
P500, P300 and PI 90 proteins of Example 5 provide enough of the art-recognized structure of an 
HCV NS2/NS3 metalloprotease to permit cleavage at the art-recognized NS2/NS3 cleavage site that 
is present in each of these proteins." Office Action at page 8. The Examiner argues that although 
certain NS2 sequences have been found to be necessary for NS2/NS3 metalloprotease activity 
(discussing the post-filing truncation studies of Hijikata et al., Grakoui et al., Reed et al., Santolini 
et al., Pieroni et al., Pallaoro et al. and Thibeault et al.), the fusion proteins of Example 5 lack these 
necessary NS2 sequences. Office Action at pages 9-11. The Office Action concludes that the 
specification fails to disclose or teach the structure of the portion of HCV polyprotein that is 
sufficient for NS2/NS3 metalloprotease-mediated cleavage at the NS2-NS3 boundary. 

Applicants respectfully traverse. Applicants submit that Example 5 relates to the NS2/3 
protease and not the NS3 serine protease. Applicants further submit that the observations of 
Example 5 can be explained by the hSOD fusion portion of the Applicants' constructs being capable 
of replacing amino acids 898-946 of HCV NS2/3 to create an active protease. 

From a review of the specification, one of skill in the art would understand that fusion of 
heterologous hSOD polypeptide sequence to a truncated NS2/3 protein, that by itself is inactive, 
restored activity of the NS2/3 protease activity, as discussed in the Declaration of Dr. Ou, 37-39; 
Second Declaration of Dr. Weiner, ffif 37-39. Fusion with unrelated heterologous proteins are known 
to restore the activity of inactive proteins or stabilize truncated proteins: 
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• Fusion of a heterologous polypeptide sequence to a truncated fragment of a 
protein that by itself is inactive, can restore activity to the fused protein fragment. 
A fragment containing the first domain of the CD45 protein lacks phosphatase 
activity, but fusion of this fragment to maltose-binding protein restores the 
phosphatase activity. Lorenzo et al. 5 FEBS. 41 1(2-3):231-5 (1997). 

• Fusion with hSOD had been observed to stabilize the HIV protease, (see 
Pichuantes et al. J. Biol. Chem 265(23), at p. 13 892, col. 2 (1990)) 

Since fusion of the NS2/3 fragments containing 299, 513 or 686 residues downstream 
from residue 946 to the C-terminal of a 151 amino acids long hSOD fragment displayed NS2/3 
specific protease activity, as shown in Example 5, one of skill in the art would understand from 
Example 5 in the specification, that fusion of the heterologous hSOD sequence to the NS2/3 
fragments containing the 299, 513 or 686 residues, was sufficient to restore NS2/3 specific protease 
activity. (Declaration of Dr. Ou, 39-40; Second Declaration of Dr. Weiner, flf 39-40.) 

The crystal structure of the NS2/3 protease has revealed that it is a dimeric protein where 
each monomer has an amino-terminal subdomain containing two antiparallel alpha-helices (HI and 
H2) and an active site is formed by a dimer interface comprising the critical residues for NS2-3 
proteolytic activity, His-952 and Glu-972, located in the loop region following helix H2, and Cys- 
993 located in the bl-b2 loop of the C-terminal subdomain. (Lorenz et al., Nature 442:831-835, at 
para 3 of left column and para 1 of right column, and Figures 1 and 2, (2006); Declaration of Dr. 
Ou, 1J41). The NS3 domain sequence of Figure 1 (SEQ ID NO: 70) includes all amino acids 
involved in dimerization and formation of the active site of the NS2/3 protease. 

Applicants note that hSOD (M r = 32,000) is a dimeric protein of 153 residues in each 
monomer. (Parge et al., Proc. Natl. Acad. Sci, USA 89:6109-61 13 (1992).) The N-terminal 151 
residues of hSOD are fused to the NS3 domain sequences in Examples 4 and 5 of the Specification. 

Further, Applicants submit that stable and active viral protease fusion proteins were 
known in the. art prior to 1991. For example, it was known in 1991 that fusion of heterologous 
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sequences to the N-terminus of proteases does not affect the proteolytic activity of the protease. 
(Declaration of Dr. Ou 5 34-35; Second Declaration of Dr. Weiner, ffif 34-35). Human 
Immunodeficiency Virus (HIV) proteases remain active when a heterologous sequence is added to 
either terminus. The fused proteases mediate self-cleavage of viral polyproteins at the correct 
cleavage sites: 

• A fusion protein comprising sequences from chloramphenicol acetyltransferase 
enzyme and HIV-1 protease is capable of autoprocessing, and mutation of the 
active site residue results in incorrect cleavage. Montgomery et al., Biochem. 
Biophys. Res. Comm., 175(3):784-94 (1991). 

• An HIV protease fused to the amino or carboxy terminus of bacterial B- 
galactosidase retains its capacity for specific autoprocessing. Valverde et al. ? J. 
Gen. Vir. 73:639-51 (1992) 

(c) Cleavage mediated by the fusion proteins of Example 5 correspond to the 
HCV NS2/NS3 protease cleavage site 

The Examiner contends that the calculated molecular mass of the observed cleavage 
product (24.9 kDa) "is clearly much less than the 34 kD relative molecular mass reported in the 
specification." Office Action, at page 14. Whereas the specification states that the size of this 
product is 34 kDa, the Examiner finds the molecular weight of the protein fragment corresponding 
to a predicted 232 amino acid cleavage product should be 24.9 kDa. 

In response, Applicants submit that an anomaly in the estimates of molecular weights of 
proteins can be explained by a number of causes. Determination of exact molecular weight by SDS- 
polyacrylamide gel electrophoresis can be unpredictable. While SDS-polyacrylamide gel 
electrophoresis is often used to estimate molecular weights of proteins by comparing migration of 
proteins relative to a set of standard markers, it was well-known in 1991 that proteins and proteases 
do not necessarily migrate on SDS-polyacrylamide gels according to their predicted molecular 
weight. "[Abnormalities in SDS binding or protein conformation, large differences in intrinsic 
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protein charge, ... may lead to increased or decreased electrophoretic mobilities; therefore caution is 
advisable in use of this technique." Proteins: Structural and Molecular Principles. T. Creighton. 
page 33. (WH Freeman and Co., New York, © 1984). "[Discrepancy between apparent relative 
masses and real molecular weights underlies the uncertainty in deducing molecular masses of 
membrane-bound proteins from their mobility in electrophoretic gels." Introduction to Protein 
Structure. Brande C, and Tooze J. page 204 (Garland Publishing, Inc. New York and London © 
1991). (Declaration of Dr. Ou, fflf 29-30; Second Declaration of Dr. Weiner, 29-30). 

In Example 5, the 34 kDa size is estimated from a Western blot of a SDS- 
polyacrylamide gel. ("[a] band corresponding to the hSOD fusion partner appeared at a relative 
molecular weight of about 34." the '455 application;" page 31, line 5 to page 32, line 12) 

As discussed in the Declaration of Dr. Ou, ^ 32; Second Declaration of Dr. Weiner, f 31, 
several proteases, including a flavivirus NS2/3 protease, are known to migrate according to 
anomalous molecular weights in SDS-polyacrylamide gel electrophoresis: 

• A NS2B-NS3 fusion protein from Dengue virus - a member of the flavivirus 
family which includes HCV - with a predicted molecular weight of 29.8 kDa 
displays anomalous migration in SDS-polyacrylamide gel electrophoresis with a 
higher apparent molecular mass of 37 kDa. Niyomrattanakit P., et al. J. Virol. 
(2004) 78(24): 13708-13716, at 1371 1, left column. 

• A serine protease with a predicted molecular weight of 24.205 kDa was found to 
migrate at greater than 26 kDa possibly due to "the presence of bound [protein] 
defensin, possible posttranslational modifications of the protease, incomplete 
reduction of the protease during sample preparation or any combination of these 
possibilities." Hamilton JV et al., Insect Molecular Biology (2002) 1 1(3): 197— 
205, at 204, left column. 

• The specification itself includes examples showing that estimates of molecular 
weights of known proteins from SDS-polyacrylamide gel electrophoresis were 
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not precisely according to the predicted theoretical size. For example, the 
molecular weight of the 151 amino acid hSOD partner by itself was estimated by 
gel electrophoresis to be about 20 kDa at page 31, lines 15-1, whereas its 
theoretical size is 16.5 kDa. 

* 

One of skill in the art would have understood the "34 kDa" band to correspond to the 
product of specific cleavage by the NS2/3 protease from the consistent observation of a 34 kDa 
band reactive to anti-HCV antisera described in Example 5 of the specification of the '455 
application, corresponding to the active fusion proteins P300, P500 and P600, but not with the 
inactive PI 90 fusion. (Declaration of Dr. Ou, If 33; Second Declaration of Dr. Weiner, 33.) 

One of skill in the art would have understood the inventors of the '455 application to 
have possession of both species ~ the NS3 serine protease and the NS2/3 protease — encompassed 
by the "Hepatitis C Virus NS3 domain protease." Therefore, Applicants respectfully request 
withdrawal of this ground for rejection under 35 U.S.C. §1 12, If 1, for lack of written description. 

If the rejection is maintained, Applicants request the Examiner to provide an affidavit 
under 37 C.F.R. 1.104(d)(2) stating facts within the knowledge of the Examiner as to why the 
rejection should be maintained. Applicants reserve to right to explain or contradict the assertion 
with their own affidavits. 

(ii) Rejection of claims 29, 30, 33 and 34 for lack of possession 

Claims 29, 30, 33 and 34, drawn to genera of proteases comprising either SEQ ID NO:63 
or SEQ ID NO:64, remain rejected as lacking written description for reasons of record. The Office 
Action contends that because the specification fails to disclose, suggest or teach any further features 
needed for the NS2/NS3 metalloprotease activity, the structural features of SEQ IDS NOS 63 and 
64 (or the fusion proteins of Example 5 that comprise SEQ ID NOS: 63 and 64) are insufficient to 
produce a protein having the claimed proteolytic activity. Office Action, at page 15. 

In response, Applicants submit that claims 29, 30, 33 and 34, drawn to genera of 
proteases comprising either SEQ ID NO: 63 or SEQ ID NO: 64 are not directed to the NS2/3 
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metalloprotease activity, but instead relate to peptide sequences comprising essential active site 
residues for HCV NS3 serine protease activity. SEQ ID NO: 63 is an 1 1 amino acid sequence 
comprising the conserved HCV His- 1083 residue of serine protease; and SEQ ID NO:64 is a 9 
amino acid sequence comprising the conserved Ser-1 165 residue of the HCV NS3 serine protease. 
These residues are essential for the catalytic activity of the serine protease, (the '455 application, 
page 7, line 19 through page 8, line 6 and Table 1). As discussed in detail above, the specification 
discloses an NS3 domain sequence comprising an active NS3 serine protease region and substrates 
there for. Claims 29, 30, 33 and 34 depend from claim 27 and further specify conserved regions 
comprising active site residues of the protease of claim 27. Therefore, Applicants respectfully 
request withdrawal of this ground for rejection. 

(iii) Rejection of claims 27-36 for lack of enablement under 35 U.S.C. § 112 

Claims 27-36 stand rejected under 35 U.S.C. § 1 12, first paragraph, because the 
specification allegedly fails to reasonably provide enablement for practicing an assay to detect an 
inhibitor of hepatitis C virus with a protease encoded by any of the P600, P500, P300 or PI 90 
constructs or comprising more than the HCV amino acid sequence region present in SEQ ID NO:68, 
or a generic version thereof, or an active truncation analog thereof. According to the Office Action, 
the disclosed NS2 and NS3 domain regions present in the P600, P500, P300 and PI90 proteins that 
Applicant proposes are sufficient for the activity of an HCV NS2/NS3 metalloprotease have been 
shown by the discoveries of others to actually be insufficient for NS2/NS3 metalloprotease activity. 

To be enabling, the specification of the patent must teach those skilled in the art how to 
make and use the full scope of the claimed invention without undue experimentation. (Genentech 
Inc. v. NovoNordisk A/S 108 F.3d 1361, 42 U.S.P.Q.2d 1001 (Fed. Cir. 1997)). 

The Examiner states that the specification clearly does not teach how to make an HCV 
NS3 domain protease that has NS2/3 protease activity. 

Applicants respectfully traverse. As discussed in detail in the previous section and in 
arguments filed in response to the first Office Action, Example 5 provides a method for making and 
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using the NS2/3 protease by fusion of a peptide having the sequence of Figure 1, or truncation 
analogs thereof, with a hSOD protein to demonstrate auto-catalytic activity shown in Example 5. 

Further, Applicants note that as of the filing date of the parent application, April 4, 1991, 
fusion of a protein of interest to human superoxide dismutase (hSOD) sequence was an established 
method of achieving high-level expression of a stable fusion protein. (Declaration of Dr. Ou, 35- 
36; Second Declaration of Dr. Weiner, fflf 35-36). The specification of the '455 application discusses 
the expression of HIV protease as a fusion with human superoxide dismutase (hSOD) and having 
autocatalytic proteolysis activity by Pichuantes et al. (Specification, page 2, lines 13-20; Declaration 
of Dr. Ou, U 35; Second Declaration of Dr. Weiner, f 35). 

Prior to 1991, examples of HIV proteases fused with hSOD and showing proteolytic 
activity for self-cleavage as well as cleavage using viral polyprotein substrates in trans, had been 
reported: 

• hSOD-HIV2 protease fusion from bacteria and yeast correctly processes HIV-1 
Pr53(gag) polyprotein in trans (Fig. 4). Pichuantes et al. J. Biol. Chem 
265(23):13890-13898 (1990). 

• A fusion protein of HIV 1 protease with human superoxide dismutase (hSOD) 
expressed in yeast displayed correct self-processing, and trans-processing of gag- 
precursor Pr53gag substrate in in vitro assays (see Fig. 4, Table 1, Pichuantes et 
al., Proteins. 6:324-37 (1989)). 

Applicants submit that the specification of the '455 application discloses a polypeptide 
sequence in Figure 1, which contains the active site residues of the NS2/3 protease, and the cleavage 
site of the NS2/3 protease. The specification also discloses how to make a NS2/3 protease by fusion 
with a hSOD partner which was a method that one of skill in the art in 1991 was familiar with. The 
hSOD fusion displayed autocatalytic cleavage corresponding to the expected NS2/3 cleavage site. 
Thus, the specification shows how to make and use an active HCV NS2/3 protease. 
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(b) Further, Applicants submit that the specification discloses a structure for the NS3 
serine protease and how to find a useful substrate without undue experimentation. 

The '455 application not only discloses the structure of the NS3 serine protease, it also 
teaches a method for making it by in vitro expression. (Declaration of Dr. Ou, 9-1 1; Second 
Declaration of Dr. Weiner, fflf 10-12). Page 20, lines 14-16 of the specification discloses full-length 
polyprotein as a substrate for HCV protease. The specification also discloses use of alternative 
substrates in the form of "small peptides" (page 20, lines 21-26). 

Applicants submit that viral polyprotein substrates for assaying proteases were 
commonly used in the art at the time of the invention. See Declaration of Dr. Ou, If 21; Second 
Declaration of Dr. Weiner, *f 22. Protease assays using trans-cleavage of viral polyprotein substrates 
were known in the art at the time of the filing of the invention. Inactivation of the active site in the 
polyprotein substrate would enable one of skill in the art to assay protease activity in trans. 
(Declaration of Dr. Ou, ^ 23; Second Declaration of Dr. Weiner, ^ 24.) 

The following examples show the widespread use of viral polyproteins as substrates for 
viral proteases prior to 1991 : 

• Processing of a 250 kDa Sindbis Virus polyprotein substrate (SI 234) in vitro by 
Sindbis Virus protease prepared by in vitro translation, de Groot, et al. The 
EMBO J. 9(8)2631-2638 (1990): 

• Trans-cleavage of a poliovirus capsomer precursor protein by poliovirus 
proteinase 3C. Nicklin: J. Virol (1988) 62: 4586-4593. 

• Trans assay of MLV protease using Gazdar murine sarcoma virus (Gz-MSV) 
polyprotein Pr65(gag) substrate. Yoshinaka, Proc Natl Acad Sci USA. (1985) 
82(6): 161 8-1622. 

• Trans assay of FeLV protease using Gazdar murine sarcoma virus (Gz-MSV) 
polyprotein Pr65(gag) substrate. Yoshinaka, J. Virol. (1985) 55(3):870-873. 
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• Trans assay of BLV protease using MLV polyprotein Pr65(gag) substrate. 
Yoshinoka et a/., J. Virol (1986) 57(3):826-832. 

• The proteinase of human immunodeficiency virus (HIV), expressed in 
Escherichia coli, shows rapid, efficient, and specific cleavage of an in vitro 
synthesized gag precursor polyprotein. Krausslich et al., Proc Natl Acad Sci U S 
A. (1989) 86(3): 807-811. 

• Processing of HIV- 1 Pr53(gag) polyprotein substrate in trans (Fig. 4) by hSOD- 
HIV2 protease fusion from bacteria and yeast. Pichuantes et al. J. Biol. Chem 
265(23):13890-13898 (1990) 

• A fusion protein comprising HIV1 protease fused with human superoxide 
dismutase (hSOD) expressed in yeast displayed correct self-processing, and 
trans-processing in vitro of a gag-precursor Pr53gag polyprotein substrate, (see 
Fig. 4, Table 1, Pichuantes et al., Proteins. 6:324-37 (1989)) 

Applicants note that a substrate for the serine protease activity in the form of genomic 
HCV polyprotein is disclosed in page 20, lines 14-16 of the specification. Page 21, lines 4-5 
explains that "[i]n the absence of this protease activity, the HCV polyprotein should remain in its 
unprocessed form." (Declaration of Dr. Ou, <[[ 22; Second Declaration of Dr. Weiner, Tf 23.) 

A method for inactivating the HCV protease activity in a HCV polyprotein by a single 
point mutation "substituting Ala for Serl21 " is disclosed at page 22, line 27 to page 23, line 15 of 
the specification. One of skill in the art would have understood that this method can be used to 
inactivate the NS3 serine protease activity of the genomic HCV polyprotein - such that it can then 
be used as a substrate for testing NS3 serine protease activity in trans. (Declaration of Dr. Ou, T[ 23; 
Second Declaration of Dr. Weiner, If 24.) 

In fact, the method disclosed in the specification was used by Lin et al. who used such a 
substrate with an inactivated serine proteinase domain to assay trans-cleavage by NS3 serine 
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protease. (Lin et al., J. Virol. 68(12): 8147-8157 (1994)) (Declaration of Dr. Ou, H 24; Second 
Declaration of Dr. Weiner, If 25.) 

Thus, one of skill in the art would understand that the '455 application describes a NS3 
serine protease based on comparison with related flavivirus proteases and identification of critical 
amino acid residues of the serine triad. One of skill in the art would also understand that a substrate 
for the NS3 serine protease is disclosed in the '455 application in the form of genomic HCV 
polyprotein. (Declaration of Dr. Ou, 25; Second Declaration of Dr. Weiner, ^ 26.) 

Further, one of skill in the art would also understand that a substrate for the NS3 serine 
protease activity in the absence of NS4A cofactor is disclosed in the '455 application in the form of 
genomic HCV polyprotein. (Declaration of Dr. Ou, ^ 25; Second Declaration of Dr. Weiner, If 26.) 

The '455 application discloses both the NS3 serine protease and the NS2/3 protease 
encoded by the claimed Hepatitis C Virus NS3 domain protease or a truncation analog thereof. 
Candidate protease inhibitors (page 21, line 12 - page 22, line 1 1) and methods for screening for 
such inhibitors (page 22, lines 12-26) are disclosed in the Specification. Thus, the '455 application 
teaches how to make and use a method for "assaying compounds for activity against hepatitis C 
virus" according to independent claims 27 and 36. Therefore, Applicants respectfully request 
withdrawal of this ground for rejection. 

D. Claim Rejections under 35 U.S.C. § 112, Second Paragraph 

(i) Rejection of claims 27-36 as being indefinite 

According to the Office Action, claim 1 (sic) currently recites, "an NS3 domain hepatitis 
C virus protease or an active . . . truncation analog" (emphasis supplied in Office Action), but no 
sequence identifier is present in claim 1 to indicate the metes and bounds of the intended subject 
matter, such as where the truncation(s) might begin. See Office Action at page 17. For instance, 
the public cannot determine what is more or less that the "domain" recited in the claims. Although 
Applicants argued that SEQ ID NO:70 should have the features sufficient for NS2/NS3 protease 
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activity, the Office Action asserts that this sequence can alternatively, thus ambiguously, be 
construed to be a active truncation analog of an NS3 protease as well. 

In response, Applicants submit that independent claims 27 and 36 specify a "purified 
proteolytic HCV polypeptide." Thus, one of skill in the art would understand that the "NS3 domain 
hepatitis C virus . . . protease truncation analog" need possess a "proteolytic" activity. The NS3 
domain of HCV is characterized by the sequence of Figure 1 (SEQ ID NO: 70). The specification 
describes a truncation analog as: "the sequence may be substantially truncated, particularly at the 
carboxy terminus, apparently with full retention of protease activity." (page 8, lines 1-3). 

As discussed in detail above, the proteins of Example 5 disclose a full length HCV 
domain protease (P600), truncation analogs that are active (P500, P300), and those that are not 
(P190). P600 includes all amino acids of Figure 1 (SEQ ID NO: 70). Thus, one of skill in the art 
would know that a truncation analog of SEQ ID NO: 70 would be a polypeptide that is missing 
amino acid residues from the full length NS3 domain but retains "proteolytic activity." 

Figure 1 and SEQ ID NO:l (pages 6-7) and SEQ ID NO:70 of the '455 application 
discloses a sequence that encompasses the entire minimal domain of the NS3 serine protease. 
(Declaration of Dr. Ou, If 18-20; Second Declaration of Dr. Weiner, fflf 19-21.) Further, SEQ ID 
NO: 65 which contains residues 1005 to 1204 of the HCV polyprotein, also includes all residues 
necessary for HCV NS3 serine protease activity. 

In their expert declarations, Drs. Ou and Weiner note that the functionally minimal 
domain required for activity of the NS3 serine protease is composed of 146 amino acids, residues 
1059 to 1204 of the HCV polyprotein. (Yamada et al. Virology 246: 104-1 12 (1998)). It has also 
been shown that the 146 amino acids long NS3 minimum domain can function by itself as a NS3 
serine protease from a structural point of view. Love et al. Cell 87: 331-342 (1996). Thus, 
truncation analogs of the sequences of Fig. 1 and SEQ ID NO: 65 that retain NS3 serine protease 
activity would be readily available to one of skill in the art. Methods for truncation of an expressed 
amino acid sequence at either end by use of an expression vector and an exonuclease activity was 
routine in the art and disclosed at page 7, lines 28-3 1 of the Specification. 

■ 

pa-l 121290 



Application No.: 09/884,455 



22 



Docket No.: 223002010004 



Therefore, Applicants respectfully request withdrawal of this ground for rejection under 
35 U.S.C. § 1 12, Second Paragraph for alleged indefiniteness. 
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CONCLUSION 



In view of the above, each of the presently pending claims in this application is believed 



to be in immediate condition for allowance. Accordingly, the Examiner is respectfully requested to 
withdraw the outstanding rejection of the claims and to allow this application to issue. If it is 
determined that a telephone conference would expedite the prosecution of this application, the 
Examiner is invited to telephone the undersigned at the number given below. 

In the event the U.S. Patent and Trademark Office determines that an extension and/or 
other relief is required, applicant petitions for any required relief including extensions of time and 
authorizes the Commissioner to charge the cost of such petitions and/or other fees due in connection 
with the filing of this document to Deposit Account No. 03-1952 referencing docket no. 
223002010004. However, the Commissioner is not authorized to charge the cost of the issue fee to 
the Deposit Account. 

Dated: February 12, 2007 Respectfully submitted, 




Shantanu Basu 
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ABSTRACT Superoxide dismutase enzymes protect aero- 
bic organisms from oxygen-mediated free-radical damage. 
Crystallographic structures of recombinant human Cu,Zn 
superoxide dismutase have been determined, refined, and 
analyzed at 2.5 A resolution for wild-type and a designed 
thermostable double-mutant enzyme (Cys-6 — * Ala, Cys-111 -* 
Ser). The 10 subunits (five dimers) in the crystallographic 
asymmetric unit form an unusual stable open lattice with 
80-A-diameter channels. The 10 independently fit and refined 
subunits provide high accuracy, error analysis, and insights on 
loop conformations. There is a helix dipole interaction with the 
Zn site, and 14 residues form two or more structurally con- 
served side-chain to main-chain hydrogen bonds that appear 
critical to active-site architecture, loop conformation, and the 
increased stability resulting from the Cys-111 — ► Ser mutation. 



Superoxide is a normal by-product of aerobic metabolism and 
is produced in numerous reactions, including oxidative phos- 
phorylation, photosynthesis, and the respiratory burst of 
stimulated neutrophils and macrophages (1, 2). The Cu,Zn 
superoxide dismutase (SOD) enzymes occur primarily in 
eukaryotes but have also been found in bacterial pathogens 
and symbionts. Major interests in SOD structure and function 
include the enzyme's Cu(II) and Zn(II) binding sites and 
active-site geometry (3), highly stable fold (4, 5), and ex- 
tremely rapid reaction rate due to long-range electrostatic 
guidance of the anion substrate (6, 7). These essential SOD 
enzymes are critical components in the physiological re- 
sponse to oxygen toxicity and are actively investigated as 
potential therapeutic agents in pathological conditions re- 
lated to oxidative stress [e.g., reperfusion damage following 
ischemia (8,9), lung and tissue damage (10, 11), and general 
inflammation (12)]. Recently, effective mitigation of pulmo- 
nary oxygen toxicity resulted in Food and Drug Administra- 
tion designation of SOD as an orphan drug for prevention of 
bronchopulmonary dysplasia in premature infants, and ex- 
tensions to other applications are expected (13, 14). SOD is 
also involved in the pervasive bioregulatory functions of 
nitric oxide by preventing nitric oxide peroxidation by su- 
peroxide (15, 16). The biological and medical importance of 
SOD has stimulated efforts to develop mutant enzymes with 
potentially improved clinical effectiveness due to increased 
stability (17, 18) or serum half-life (19, 20). Identification of 
Cu,Zn SOD enzymes in parasites (21, 22) suggests the design 
of selective inhibitors that might block parasite SOD without 
affecting the human enzyme. All of these studies have been 
hampered by the lack of a structure for human SOD (HSOD), 
which has been cloned and expressed in yeast (23, 24). Here 
we report the determination, refinement, and analysis of the 
wild-type and the designed thermostable mutant human en- 
zymes at 2.5 A resolution. § 
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MATERIALS AND METHODS 

Wild-type and thermostable mutant (Cys-6— ► Ala, Cys-111— ► 
Ser) HSODs were purified from recombinant yeast and 
crystallized in space group C222 lt a = 205.2, b = 167.0, c = 
145.5 A (25), with 10 subunits per asymmetric unit. Diffrac- 
tion data comprising 742,579 observations of 77,344 unique 
reflections (88.4% complete to 2.48 A) collected on the more 
stable mutant HSOD with a Xentronics area detector (Sie- 
mens Analytic X-Ray Instruments, Madison, WI) were used 
for the structure solution. The structure was solved by 
molecular replacement with the merlot (26) program pack- 
age using the "humanized" dimer of bovine SOD as the 
probe. The solution was confirmed by using ethylmercuric 
phosphate, 3-chloromercuri-2-methylpropylurea, and potas- 
sium gold cyanide derivatives. The current model, which 
contains 1530 amino acids and 10 Cu, 10 Zn, and 2 sulfate ions 
and 499 water molecules, has been crystallographically re- 
fined with xplor (27) to a residual error (R factor) of 20.2% 
for 63,790 reflections (>3a) between 10.0 and 2.5 A resolu- 
tion and 18.9% for 54,476 reflections (>5o) between 6.0 and 
2.5 A resolution. Overall deviations from ideal geometry are 
0.017 A for bond distances and 3.5° for bond angles. For 
wild-type HSOD, 838,068 observations of 96,710 unique 
reflections were collected on film to 2.0 A resolution using 
synchrotron radiation. This data is 81.5% complete (70,227 
reflections) to 2.5 A resolution for intensities >2<r. The 
refined model has an R factor of 21.0% for 69,714 reflections 
(>3o~) between 10.0 and 2.5 A resolution, with overall devi- 
ations from ideal geometry of 0.023 A for bond distances and 
4.3° for bond angles. 

RESULTS AND DISCUSSION 

Crystal Lattice and Subunit Structure. HSOD {M T = 32,000) 
is a dimeric enzyme with ellipsoidal dimensions of about 30 
x 40 x 70 A. Each identical subunit contains 153 residues, 
one Cu ion, and one Zn ion. The five HSOD dimers (named 
A, B, C, D, and £; each has subunits 1 and 2) pack to form 
a very open but well-ordered crystal lattice with over 68% 
solvent (Fig. 1). Each of the dimers is internally related by a 
noncrystallographic twofold axis approximately parallel to 
the crystallographic c axis. The five-dimer asymmetric units 
generate a honeycomb layer in the ab plane (Fig. 1A), 
composed of dog-bone-shaped building blocks consisting of 
two trimers of dimers, with the central C dimer participating 
in both trimers. These building blocks pack with two types of 
close interactions along whole subunit faces, at the ends and 
sides of the dog bone extremities (Fig. 15). In the third 
dimension, along the c axis, the honeycomb layers are 



Abbreviations: HSOD, human superoxide dismutase; SOD, super- 
oxide dismutase. 

*To whom reprint requests should be addressed. 

$The atomic coordinates for the 10 subunits of the thermostable 
mutant HSOD have been deposited in the Protein Data Bank, 
Chemistry Department, Brookhaven National Laboratory, Upton, 
NY 11973 (file 1SOS). 
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Fig. 1. Packing of the HSOD molecules in the crystal lattice, viewed down the c axis with the a axis horizontal and the b axis vertical. Five 
HSOD dimers (10 subunits) make up the crystallographic asymmetric unit. (A) Electron density map contours (blue) rendered in a solid raster 
view to show the interlocked packing of HSOD dimers in the ab plane, producing both high-resolution diffraction and the open lattice. The 
five-dimer, dog bone-shaped, asymmetric units (which lie along diagonals from the lower left to the upper right) are formed by two overlapping 
trimers of dimers. Packing of the next ab layer of the crystal lattice (not shown) constricts the open channels slightly, to about 80 A in diameter. 
(B) The five HSOD dimers in an alternative asymmetric unit. The a-carbon backbones are shown with thicker tubes, and the individual side 
chains are shown with thinner tubes. Cu (gold sphere) and Zn (blue sphere) ions show the locations of the active sites. Subunits are color-coded 
in dimer pairs. The green A, blue B, and yellow C dimers form one trimer, with the A2, B2, and C2 subunits associating around a sulfate group 
(red O and yellow S atoms). The purple D and red E dimers form two-thirds of the next trimer of dimers (upper right), completed by the C dimer 
of the adjacent asymmetric unit. The D2, E2, and d subunits of this second trimer surround another sulfate group (upper right). The 
diamond-shaped links formed by the green, blue, purple, and red dimers form the zigzag chains running diagonally through the lattice from upper 
left to lower right in A. These chains are cross-braced by the yellow C dimers, which are exposed to the large lattice cavities and form the middle 
of the dog-bone building block. In the next layer of the crystal lattice, generated by the twofold screw axis along c through the blue B dimer, 
the next B dimer will pack on top of itself, but the next red E dimer will be offset below B and beside the yellow C dimer. 



stacked on each other (by the twofold axis along b) and 
staggered by one dimer width along the b axis. This leaves 
only the E dimer without continuous contacts along c and 
reduces the cross-section of the hexagonal channels along c 
from 120 x 120 to 120 x 80 A. There are other proteins that 
form open crystal lattices with large solvent channels (28, 29), 
but they are less highly ordered than the 2 A resolution 
diffraction of these HSOD crystals. 

The mutant and wild-type HSOD structures confirm that 
the overall flattened eight-stranded 0-barrel fold, metal-site 
ligands and geometry, and dimer contact match that of the 
bovine SOD structure (30, 31). Superposition of the HSOD 
and bovine SOD structures indicated that the sequence 
changes do not alter the 0-barrel diameter, strand angles, or 
loop conformations except in the region of the two-amino- 
acid insertion at sequence position 25. Loops form the 
active-site channel and connect the antiparallel 0- strands 
with a Greek-key topology of +1, +1, +3, -1, -1, +3, +1. 

Tenfold Redundancy. To assess structural accuracy and to 
distinguish conserved versus variable loop conformations in 
different crystal-packing environments, we did not average 
the electron density but instead fitted and refined each of the 
10 HSOD subunits independently. This allows identification 
of specific patterns that provide the basis for loop confor- 
mation and stability, for this and other 0-barrel proteins. 

The u nave raged electron density shows structural details 
clearly, including carbonyl oxygens, bound water molecules, 
and crystal contact interactions (Fig. 2 A)/ Similarly, the 
His-43 side-chain interactions that tie the short Greek-key 
connection to the active site (Fig. 2B) are well defined in the 
maps of each of the subunits. Superposition of the 10 subunits 
(Fig. 3A) shows that the loops and chain termini are only 
slightly more variable than the 0-barrel framework. For all 
residues including crystal contacts, the average root-mean- 
square deviations in atomic positions are 0.38 A for a-car- 
bon s, 0.43 A for main-chain atoms, and 0.85 A for all atoms. 
Atomic coordinates for all 10 subunits are available in the 



Protein Data Bank, for purposes of error analysis and for 
comparison with sets of multiple structures obtained by 
nuclear magnetic resonance methods. 

Trp-32. The single tryptophan in HSOD is important in 
spectroscopic studies. It lies on the outside of the 0-barrel, 
exposed in the dimer, but involved in some crystal contacts 
(Fig. 2A). Its conformation is completely invariant in all 10 
subunits, however, with xi +60° and xz *** +90°; one side 
of the ring lies against and C y of Asn-19, while the other 
side and the ring N are exposed. In some but not all subunits, 
an ordered water hydrogen bonds with N el . Nearby (at the 
bottom of Fig. 2A), there is a stripe of alternating charges: 
Glu-100, Lys-30, Glu-21, Lys-3, and the C terminus. Most of 
this environment around Trp-32 is conformationally invari- 
ant, but Lys-30 varies widely. 

Critical Side-Chain Hydrogen Bonds. There are seven loops 
in SOD, numbered in sequence order. Loops I and V are 
short 0- hairpin connections between adjacent 0- strands. The 
five longer loops are shown in Fig. 32?. Loop II (residues 
24-27) forms the 0-hairpin containing the two-residue inser- 
tion relative to bovine SOD. Loops III (residues 37-40) and 
VI (residues 102-114) form the two Greek-key ^-barrel 
connections. The active-site channel with its bound metal 
ions is formed between the electrostatic loop VII (121-144), 
implicated in substrate attraction, and loop IV (49-84), made 
up of the disulfide and the Zn-ligand subloop regions. 

Compared to the 0-barrel framework, the loops have more 
hydrogen bonds and other hydrophilic side-chain interac- 
tions. Most (>70 out of about 85) of the main-chain hydrogen 
bonds and ^80 side-chain to main-chain hydrogen bonds in 
each subunit are conserved in all subunits, although most 
exposed polar side chains show significant conformational 
variability. At critical positions within or near the loops, 14 
sequence-conserved (32), structurally conserved side chains 
appear to play important roles in loop conformation and 
interactions. Each forms two or more conserved side-chain to 
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Fig. 2. Atomic model (yellow bonds) for parts of the HSOD Cys-6 Ala, Cys-111 -*> Ser thermostable double mutant, shown in typical 
electron density (pink and blue mesh). The residue numbering runs from 1 to 153 for subunit 1 and from 201 to 353 for subunit 2, preceded by 
the dimer letter. (A) Map and model showing 0-strands 2-4 in subunit A2 and their environment in the crystal. The three antiparallel 0-strands 
run vertically, with clear electron density for side chains, carbonyl oxygens, and water molecules (indicated by +). The A2-B2 interface (upper 
left) includes a main-chain hydrogen bond between O of A2 Asp-96 and N of B2 Glu-132. The A2-E1 interface (right) involves several side-chain 
hydrogen bonds, including the N el of A2 Trp-32 to O 52 of Ei Asp-96. Although buried in this contact, Trp-32 (an important spectroscopic probe) 
is exposed on the surface of the dimer, as can be seen in Fig. IB. {B) Map and model of the short Greek-key connection (loop III) and the adjacent 
active site, showing the key side-chain interactions of His-43 (labeled A243). The Cu(II) and Zn(II) ions are in high peaks of electron density 
(pink contours) above and below bridging His-63. Loop III curves around the His-43 ring at top, and the following 0- strand continues down 
to Cu-ligand His-46. His-43 links the Greek-key connection to the ^-barrel and the active site by hydrogen bonds from its ring nitrogens to the 
backbone carbonyls of Thr-39 and Cu-ligand His-120. 



main-chain hydrogen bonds to nonadjacent residues or li- 
gates a metal ion and forms one such hydrogen bond. 

Six of these critical side chains form hydrogen bond 
bridges between structural elements (Fig. 3B), Five are on 
0- strands, and one is in a loop near the beginning of a strand 



(Arg-143); all anchor the active-site channel by hydrogen 
bonding to main-chain atoms, usually O f of the loops (Fig. 
3B). His-43 is shown in detail in Fig. 2B. Cu-ligands His-48 
and His-120 hydrogen bond to the disulfide subloop at 0 6 i 
and to the electrostatic loop at O141 and Ow* respectively. 




■a 
ft 




Mm 




Fig. 3. Conservation and variation of the subunit fold and conformationally critical side chains shown for the thermostable double mutant 
of HSOD, (A) The a-carbon backbones for the 10 superimposed HSOD subunits, showing the overall conformational accuracy. This 10-fold 
redundancy provides confident assignment of specific loop conformations and major side-chain interactions and also gives a data set suitable 
for comparison with multiple structure sets obtained for other proteins by NMR methods. The greatest variations in HSOD occur at the loop 
bends and the two termini. In the active-site channel around the Cu (gold sphere) and Zn (blue sphere), the loops superimpose closely. Two 
turns of a-helix (horizontal, at far right) provide helix dipole stabilization of the Zn(II) ion. (B) HSOD ^-structure framework (blue) and loop 
regions, along with conformationally important side chains. The a-carbon backbone is shown as a ribbon colored to highlight the various loops. 
Critical side chains that form multiple side-chain to main-chain hydrogen bonds are shown in ball and stick representation, labeled by residue 
number and the one-letter amino acid code and color-coded to match the loops they stabilize: Gln-22 for the purple insertion loop II (see Fig. 
4A)\ His-43 for the light-blue Greek-key connection floop III; see also Fig. 2B)\ Ser-59 and Arg-143 for the yellow disulfide subloop (IV); Asn-65 
and Arg-79 for the gold Zn-Ugand subloop (IV); Ser-111 and Arg-115 for the green Greek-key loop VI; and Asn-86, Asn-131, and Ser-134 for 
the red electrostatic loop VII. Metal ligands His-48, His-120, and Asp-83 also make side-chain to main-chain hydrogen bonds to the loop regions. 
The Cu-ligands (His-46, His-48, His-63, and His-120) form distorted square-planar geometry, while the Zn-Iigands (His-63, His-71, His-80, and 
Asp-83) are tetrahedral. The Cu and Zn are linked directly by the bridging His-63 and indirectly by the side-chain carboxylate of buried Asp- 124, 
which hydrogen bonds to both a Cu and a Zn-ligating histidine. 
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Asn-86 forms one hydrogen bond to the 0-strand Gly-44 
nitrogen and two to the main-chain atoms of electrostatic 
loop residue Asp- 124, whose side chain in turn bridges 
between Zn-ligand His-71 and Cu-ligand His-46. Arg-115 
hydrogen bonds to O m in the longer Greek-key loop and to 
Cu-ligand neighbor O49. Activity-important Arg-143 in the 
electrostatic loop hydrogen bonds to three backbone carbon- 
yls in the disulfide subloop. 

One other structurally conserved hydrogen bond links 
loops IV and VII to brace the active-site channel, from the 
backbone N of Zn-ligand His-71 to the main-chain CO of 
Thr-135 in the short helix in the electrostatic loop. This 
six-residue helix (residues 132-137) is well-ordered and has a 
consistent conformation in the 10 independent subunits. The 
a-helix dipole is oriented to stabilize and be stabilized by the 
Zn-ion binding site (Fig. 3B). This helix dipole interaction, 
which was not recognized in structural analyses of bovine 
SOD, suggests that Zn binding stabilizes the appropriate 
conformation for the electrostatically important, sequence- 
conserved residues Glu-132, Glu-133, and Lys-136 shown 
computationally (7, 33) to promote recognition of the super- 
oxide anion substrate. 

Eight of the sequence-conserved, structurally consistent 
side chains form multiple side-chain to main-chain hydrogen 
bonds across bends within a loop, presumably to control 
conformation for each of the longer loop regions (Fig. 32?). 
The Ser-59 O y hydrogen bonds to O52, N54, N55, and 0% to 
cross-brace the major bend in the disulfide subloop. The 
Zn-ligand subloop contains hydrogen bonds from the Asn-65 
side chain to Nm, N^, and O^; from the Arg-79 side chain to 
O74, Osi, and and from the Asp-83 side chain to N72 and 
Nso* Within the electrostatic loop, the Asn-131 O* hydrogen 
bonds to N133 and N 134 as the initiating N-cap of the helix 
discussed above. The side chain of Ser-134 stabilizes the 
major open turn within the electrostatic loop by hydrogen 
bonding to Qm, N128, and N129. 

If side-chain to main-chain hydrogen bonds help control 
conformation at longer loop bends, then they might be 
expected at loop insertions. Indeed, the two-residue insertion 



(relative to bovine SOD) in loop II of HSOD is stabilized by 
hydrogen bonds from the side chain of Gln-22 across the 
0-hairpin to O25 and O27 (Fig. 4A). Gln-22 is present in all 
sequences containing the loop insertion, suggesting that it is 
a compensatory sequence change. Ala-22 — ► Gin is in fact the 
only nonconservative sequence difference between bovine 
SOD and HSOD on the first four 0-strands that is not fully 
solvent-exposed. Such mechanisms to accommodate loop 
insertions may be needed in evolution, since, for instance, 
random single-residue loop insertions in staphylococcal nu- 
clease are significantly destabilizing (with an average cost of 
3.8 kcal/mol) (34). 

Thermostability of the Doable Mutant. To experimentally 
test the role of free disulfides in irreversible thermal inacti- 
vation and the role of intraloop hydrogen bonds in providing 
thermodynamic stabilization, we mutated both free cysteines 
in HSOD, separately and together, to residues naturally 
occurring in these positions in other SOD sequences: Cys-6 
-» Ala and Cys-111 Ser. All three mutants are more stable 
to irreversible thermal inactivation, presumably due to the 
removal of the reactive thiols (18); the double mutant is more 
stable to irreversible thermal inactivation than either of the 
single mutants. 

In wild-type HSOD, Cys-111 forms weak (long) hydrogen 
bonds to On* and N U3 in only 2 of the 10 subunits, whereas 
in the thermostable HSOD double mutant, Ser-111 consis- 
tently makes both of these hydrogen bonds that stabilize the 
Greek- key loop (Fig. 4B). Comparison of the wild-type and 
double mutant HSOD structures at position 6 shows shifts 
similar to those observed in the bovine SOD Cys-6 — » Ala 
structure (17), which completely close the cavity previously 
occupied by the sulfur atom. This results in a slight increase 
(0.1 kcal/mol) in stability to reversible denaturation relative 
to wild-type HSOD (18). In contrast, the HSOD Cys-111 
Ser single mutant, which not only removes a reactive thiol but 
also provides improved side-chain to main-chain hydrogen 
bonding, is 0.8 kcal/mol more stable than the wild type (18). 
These structure and stability results suggest that side-chain to 




[/mi. 



Fic. 4. Critical pairs of side-chain to main-chain hydrogen bonds stabilize the two-residue insertion in loop II and loop VI of the thermostable 
double mutant. The structural model is shown with bonds colored by atom type: red, oxygen; blue, nitrogen; and green, carbon. (A) The 
two-residue insertion of Asn-26 and Gly-27 (white ribbon) into loop II is stabilized by a set of hydrogen bonds from Gln-22 (center) that link 
the carbonyl oxygen atoms of loop residues Ser-25 and Gly-27 back to the nearby Greek-key loop at Ser- 105 and Leu-106 (whose side chain 
plugs one end of the 0-barrel). The SODs from other species that have this two-residue loop insertion all have Gin in position 22. (B) Superposition 
of the wild-type (pink) and thermostable double mutant (atom-colored) structures in Greek-key loop VI. The mutant has increased conformational 
stability resulting in part from stronger side-chain to main-chain hydrogen bonds from the O* of mutated Ser-111 to the N of Ile-113 and the 
O of Leu-106. Similar hydrogen bonds are formed in the bovine SOD structure (not shown). In wild-type HSOD (pink), Cys-111 and Leu-106 
are pushed away from each other, and the Greek-key opens up. Although long hydrogen bonds are formed by Cys-111 in two of the 10 wild-type 
subunits, this loop is not consistently tethered in the wild type as it is in the mutant structure. 
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main-chain hydrogen bonds across loop bends can provide a 
net stabilization to the folded protein. 

Implications for SOD Structure and Protein Loop Stability 
and Design. The activity and stability of Greek-key 0-barrel 
proteins, such as HSOD, are strongly determined by the 
loops joining the /3-strand framework. Loops have been 
categorized by structural patterns (35), and roles of individual 
residues in controlling antibody loop conformations have 
been identified (36, 37). The present results identify 14 
sequence-conserved, structurally consistent residues that 
form multiple side-chain to main-chain hydrogen bonds crit- 
ical for HSOD active-site stereochemistry, for local loop 
conformations, and for the stable connection of different 
structural elements. Reliable characterization of these criti- 
cal side chains benefited greatly from their structural con- 
sistency in the 10 independent HSOD subunits. Comparison 
of the bovine SOD and HSOD structures suggests that 
side-chain to main-chain hydrogen bonds can compensate for 
loop insertions. The structure and stability of the (HSOD 
Cys-6 — ► Ala, Cys-111 -* Ser) mutant demonstrate that 
intraloop side-chain to main-chain hydrogen bonds can ther- 
modynamically stabilize the enzyme. The hydrogen bonding 
patterns and the a-helix dipole interaction tie the Zn site to 
the loop involved in electrostatic recognition of the anion 
substrate. Taken together, these results provide a structural 
basis for loop redesign in HSOD and other 0-barrel proteins, 
in order to alter activity, stability, and binding functions. 
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Biochemistry. In the article "Atomic structures of wild-type 
and thermostable mutant recombinant human Cu,Zn super- 
oxide dismutase" by Hans E. Parge, Robert A. Hallewell, 
and John A. Tainer, which appeared in number 13, July 1992, 
of Proc. Natl. Acad. ScL USA (89, 6109-6113), the authors 
request that the following corrections be noted. Due to 



printers errors, Fig. 1 was incorrectly cropped and Fig. 2 was 
not in focus. The corrected figures are shown below. In 
addition, in Figs. 1-4, locants A and B were inadvertently left 
off; therefore, throughout the paper, panels A and B in Figs. 
1-4 correspond to the left and right panels, respectively. 



£3; 




Fig. 1. Packing of the HSOD molecules in the crystal lattice, viewed down the c axis with the a axis horizontal and the b axis vertical. Five 
HSOD dimers (10 subunits) make up the crystallographic asymmetric unit. (A) Electron density map contours (blue) rendered in a solid raster 
view to show the interlocked packing of HSOD dimers in the ab plane, producing both high-resolution diffraction and the open lattice. The 
five-dimer, dogbone-shaped, asymmetric units (which lie along diagonals from the lower left to the upper right) are formed by two overlapping 
trimers of dimers. Packing of the next ab layer of the crystal lattice (not shown) constricts the open channels slightly, to about 80 A in diameter. 
(B) The five HSOD dimers in an alternative asymmetric unit. The a-carbon backbones are shown with thicker tubes, and the individual side 
chains are shown with thinner tubes. Cu (gold sphere) and Zn (blue sphere) ions show the locations of the active sites. Subunits are color-coded 
in dimer pairs. The green A, blue B, and yellow C dimers form one trimer, with the A 2 , B 2 , and C 2 subunits associating around a sulfate group 
(red O and yellow S atoms). The purple D and red E dimers form two-thirds of the next trimer of dimers (upper right), completed by the C dimer 
of the adjacent asymmetric unit. The D^ E 2 , and Ci subunits of this second trimer surround another sulfate group (upper right). The 
diamond-shaped links formed by the green, blue, purple, and red dimers form the zigzag chains running diagonally through the lattice from upper 
left to lower right in A. These chains are cross-braced by the yellow C dimers, which are exposed to the large lattice cavities and form the middle 
of the dog-bone building block. In the next layer of the crystal lattice, generated by the twofold screw axis along c through the blue B dimer, 
the next B dimer will pack on top of itself, but the next red E dimer will be offset below B and beside the yellow C dimer. 





Fig. 2. Atomic model (yellow bonds) for parts of the HSOD Cys-6 -» Ala, Cys-111 Ser thermostable double mutant, shown in typical 
electron density (pink and blue mesh). The residue numbering runs from 1 to 153 for subunit 1 and from 201 to 353 for subunit 2, preceded by 
the dimer letter. (A) Map and model showing 0-strands 2-4 in subunit A 2 and their environment in the crystal. The three antiparallel 0- strands 
run vertically, with clear electron density for side chains, carbonyl oxygens, and water molecules (indicated by +). The A2-B 2 interface (upper 
left) includes a main-chain hydrogen bond between O of A 2 Asp-96 and N of B 2 Clu-132. The A2-E1 interface (right) involves several side-chain 
hydrogen bonds, including the N cl of A 2 Trp-32 to O 51 of Ei Asp-96. Although buried in this contact, Trp-32 (an important spectroscopic probe) 
is exposed on the surface of the dimer, as can be seen in Fig. IB. (B) Map and model of the short Greek-key connection (loop III) and the adjacent 
active site, showing the key side-chain interactions of His-43 (labeled A243). The Cu(II) and Zn(ll) ions are in high peaks of electron density 
(pink contours) above and below bridging His-63. Loop III curves around the His-43 ring at top, and the following 0-strand continues down 
to Cu-ligand His-46. His-43 links the Greek-key connection to the 0-barrel and the active site by hydrogen bonds from its ring nitrogens to the 
backbone carbonyl s of Thr-39 and Cu-ligand His-120. 
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Structure of the catalytic domain of the hepatitis C 
virus NS2-3 protease 

Ivo C Lorenz 1 *, Joseph Marcotrigiano 1 *, Thomas G. Dentzer 1 & Charles M. Rice 1 



Hepatitis C virus is a major global health problem affecting an 
estimated 170 million people worldwide 1 . Chronic infection is 
common and can lead to cirrhosis and liver cancer. There is no 
vaccine available and current therapies have met with limited 
success 2 . The viral RNA genome encodes a polyprotein that 
includes two proteases essential for virus replication 3,4 . The 
NS2-3 protease mediates a single cleavage at the NS2/NS3 junc- 
tion, whereas the NS3-4A protease cleaves at four downstream 
sites in the polyprotein. NS3-4A is characterized as a serine 
protease with a chymotrypsin-like fold 5,6 , but the enzymatic 
mechanism of the NS2-3 protease remains unresolved 7-9 . Here 
we report the crystal structure of the catalytic domain of the NS2-3 
protease at 2.3 A resolution. The structure reveals a dimeric 
cysteine protease with two composite active sites. For each active 
site, the catalytic histidine and glutamate residues are contributed 
by one monomer, and the nucleophilic cysteine by the other. The 
carboxy-terminal residues remain coordinated in the two active 
sites, predicting an inactive post-cleavage form. Proteolysis 
through formation of a composite active site occurs in the context 
of the viral polyprotein expressed in mammalian cells. These 
features offer unexpected insights into polyprotein processing 
by hepatitis C virus and new opportunities for antiviral drug 
design. 

Crystallization of the catalytic domain of the hepatitis C virus 
(HCV) NS2-3 protease (NS2 pro , consisting of residues 94-217 of 
NS2; Fig. la) using native and selenomethionine-containing protein 
yielded two crystal forms with the same space group (P2i, Sup- 
plementary Table 1). The asymmetric units of the native and 
selenomethionine-containing protein contained twelve and six 
' molecules organized into six and three tightly packed dimers, 
respectively (Supplementary Fig. 1). 

The NS2 pro monomer consists of two subdomains connected by an 
extended linker (Fig. lb and Supplementary Fig. 2a). The amino- 
terminal subdomain contains two antiparallel ot-helices (HI and 
H2) followed by several turns and loops that contact both HI and 
H2. The polypeptide chain continues into an extended region 
before entering a four-stranded, antiparallel (3-sheet in the 
C-terminal subdomain. The last |3-strand continues to the C 
terminus of NS2. Figure lc, d represents views of the NS2 pro 
dimer, which resembles a 'butterfly' with two-fold symmetry along 
the vertical axis (Fig. lc). The N-terminal subdomain of one 
molecule interacts with the C-terminal subdomain of the other 
molecule and vice versa. The two extended linkers cross over in the 
middle of each molecule and each contribute a (3 -strand (bl) to the 
antiparallel 3-sheet in the C-terminal subdomain of the other 
molecule. The N termini of the two monomers lie relatively close 
to each other, whereas the solvent-exposed C termini are positioned 
on opposite sides of the molecule. 



Critical residues for NS2-3 proteolytic activity 7,8 , His 143 and 
Glu 163, are located in the loop region following helix H2 in the 
N-terminal subdomain, whereas another critical residue, Cys 1 84, lies 
at the end of the linker arm in the bl-b2 loop of the C-terminal 
subdomain. At the dimer interface, the histidine and glutamate 
residues from one monomer are close to the cysteine from the 
other chain (Fig. 2a). The arrangement of these three residues is 
suggestive of a composite cysteine protease active site (Fig. 2b and 
Supplementary Fig. 2b). 

The NS2 pro structure represents a novel protein fold (DALI server" 1 
Z scores of less than 3.0). However, superimposing His 143, Glu 163 
and Cys 184 of NS2 pro with the active sites from cysteine proteases 
such as papain" (Fig. 2c) and poliovirus 3C protease 12 (Fig. 2d) 
demonstrated a similar spatial distribution. The orientation of the 
catalytic cysteine residues from NS2 pro and 3C pro is similar to the 
catalytic serine residues of Sindbis virus capsid 15 (Fig. 2e) and the 
cellular protease subtilisin 14 (Fig. 2f). Thus, like poliovirus 3C pro , 
HCV NS2-3 is a cysteine protease with a serine protease active site 
geometry 15 . To the best of our knowledge, NS2 pro represents the first 
example of a cysteine or serine protease that forms a dimer contain- 
ing a pair of composite active sites. However, these features are 
reminiscent of retroviral aspartic proteases, which consist of dimers 
with a single active site at the dimerization interface 16 . Other 
proteases such as caspases 17 require dimerization for activity, but 
they do not contain composite active sites. 

Interestingly, Pro 164, which is entirely conserved in all HCV 
sequences, has a ris-peptide conformation (Fig. 2b and Supplemen- 
tary Fig. 2c). Pro 164 may bend the peptide backbone of the catalytic 
Glu 163 to establish the correct geometry of the glutamate side chain 
for catalysis. In addition, the cis-proline may contribute to dimer 
stabilization as the linker connecting the two subdomains follows 
Pro 164. 

The backbone carboxylic acid of the C-terminal residue of NS2 prn , 
Leu 217, remains coordinated in the active site via contacts to the side 
chains of His 143 and Cys 184, and to the backbone nitrogen of 
Cys 184 (Fig. 2b), Comparison of the NS2 pro structure with protease- 
inhibitor complexes indicates that the C terminus of NS2 may exert 
an inhibitory function after cleavage 8 . Sindbis virus capsid protein 
mediates a single autoproteolytic cleavage at its C terminus, which 
remains bound to the active site inhibiting further catalysis 13 . The 
C-terminal residues of Sindbis virus capsid (Trp 264) and HCV NS2 
(Leu 2 17) are in the same orientation relative to the catalytic triad 
(Fig. 2e). Comparing the structure of NS2 pro with subtilisin bound to 
the inhibitor Eglin-C M demonstrates that the C terminus of NS2 pro 
occupies a position equivalent to the inhibitor (Fig. 2f). We propose a 
model in which each NS2 pro molecule catalyses a single NS2-3 
cleavage event, allowing tightly regulated processing. However, we 
cannot rule out the possibility that NS2 mediates proteolysis of other 
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viral or cellular proteins if the C-terminaJ (3-strand (b5) is displaced 
from the active site. 

Processing at the NS2/NS3 junction requires the NS3 protease 
domain and is stimulated by the addition of exogenous zinc 7,818,19 . 
However, because NS2 contains a complete cysteine-protease active 
site with the C terminus positioned for catalysis, the function of NS3 
remains undetermined. The crystal structure of NS2 pro represents the 
post-cleavage form, which may differ from the NS2-3 precursor. NS3 
may interact with the highly conserved surface surrounding the NS2 
active site (Supplementary Fig. 3), contributing to a functional 
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Figure 1 { Processing of the HCV polyprotein and architecture of NS2 pro . 

a, Membrane association of the HCV polyprotein, which contains a core, 
envelope proteins El and E2, p7, and nonstructural (NS) proteins NS2, NS3, 
NS4A, NS4B, NS5A and NS5B. Cleavage in the structural region and at the N 
terminus of NS2 occurs by action of the host signal peptidase (filled 
arrowheads) and host signal peptide peptidase (diamond). The NS2-3 
protease, which consists of residues 94-217 of NS2 (blue) and residues 
1-181 of NS3 (green), cleaves at the NS2/NS3 junction (blue arrow). All 
cleavages downstream of NS3 are mediated by the NS3-4A protease (green 
arrows). The model shown here depicts NS2 with three N-terminal 
transmembrane segments followed by the protease domain on the 
cytoplasmic face of the endoplasmic reticulum (ER) membrane 19 . 
Alternatively, the NS2 C terminus may be localized to the ER lumen 28 - 29 . Cyt, 
cytosol. b, Ribbon diagram showing the NS2 monomer. The N and C 
termini, and secondary structure elements (helices Hi and H2; f3 -strands 
bl-b5), are labelled, c, Ribbon diagram showing the NS2 dimer with one 
monomer in blue, the other in red. The N and C termini, and secondary 
structure elements of each monomer, are indicated, d, Ribbon diagram of 
the NS2 dimer viewed perpendicular to helices H 1 and H2. This view is a 90° 
rotation around the horizontal axis in c. 



catalytic environment and correct positioning of the scissile bond. 
The backbone nitrogen of Cys 184 contacts the carboxylic acid of 
Leu 217 and may serve as part of the oxyanion hole to stabilize the 
transition state during catalysis. A residue in uncleaved NS2-3 (either 
a backbone nitrogen of NS2 oriented differently in the pre-cleavage 
form or a residue within the NS3 serine protease domain) may also 
contribute to the oxyanion hole. The zinc requirement maybe due to 
its structural function in NS3 (refs 5, 6) rather than a role in NS2-3 
catalysis. Thus, limiting zinc could indirectly inhibit NS2-3 cleavage 
by affecting NS3 folding 20 . Consistent with this idea, the NS2-3 
protease is inhibited by mutations in zinc-coordinating residues of 
NS3 (refs 7, 8). 

Molecular surface analysis and biochemical data support the 
NS2 pro dimer model. NS2 pro shows a high degree of amino-acid 
sequence conservation at the interface between the two monomers 
(Supplementary Fig. 3). The cleavage rate of purified NS2-3 is 
concentration dependent, indicating that the active form of the 
protease is oligomeric 19 . Analytical ultracentrifugation of NS2 pro 
yielded a single, monodisperse species with a molecular weight of 
39 kDa that most likely corresponds to a dimer with bound detergent 
(data not shown). Moreover, cross-linking of NS2 pro in solution with 
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Figure 2 j The active site of NS2 and comparison with other proteases. 

a, Location of the two active sites in the NS2 dimer (boxed regions). The 
solid-lined box represents the active site displayed in b~f. The distance 
between the two active sites is indicated, b, The NS2 active site. Residues 
His 143, Glu 163, Pro 164, Cys 184 and Leu 217 are shown as stick drawings. 
The active site is composed of His 143 and Glu 163 from one molecule of the 
dimer (chain A, drawn in blue), and Cys 184 from the other molecule (chain 
B, drawn in red). The C-terminal residue, Leu 217, originates from the same 
chain as Cys 1 84. Dashed lines indicate contacts between selected residues. 
The length of each contact is provided, c-f, Superim position of His 143, 
Glu 163 and Cys 184 of NS2 with similar catalytic sites of the proteases papain 
(c), poliovirus 3C protease (d), Sindbis virus capsid (e) and subtilisin bound to 
the inhibitor Eglin-C (coloured yellow) (f). The orientation and colouring of 
the NS2 active site is identical to b, with the other proteases shown in grey. 
In e, the C-terminal residue Trp 264 of Sindbis virus capsid is labelled. 
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disuccinimidyl suberate (DSS) led to the identification of a dimeric 
species (Supplementary Fig. 4). 

A series of experiments in mammalian cells was designed to test 
whether NS2 pro can form dimers with a functional composite active 
site in vivo. HCV full-length polyproteins containing either a H143A 
or a C184A mutation in the NS2 active site are defective in NS2-3 
processing 7,8 . However, if a composite active site can form, co- 
expression of the two mutant polyproteins should result in partial 
NS2-3 cleavage (Fig. 3a). Indeed, when HCV polyproteins with NS2 
containing either a H143A or C184A mutation were co-expressed, 
NS2 and NS3 cleavage products were detected (Fig. 3b), indicating 
the formation of a Functional composite active site. Moreover, it is 
possible to predict from the crystal structure which mutant poly- 
peptide in the mixing experiment is cleaved, because the C-terminal 
Leu 217 and the catalytic Cys 184 originate from the same chain. Thus, 
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mixing of the two mutants should lead to cleavage of NS2-3(H143A), 
whereas NS2-3(C184A) is predicted to remain unprocessed. By 
expressing NS2-3 proteins with a Flag or haemagglutinin (HA) tag 
fused to the N terminus of NS2, it is possible to distinguish Flag-NS2 
from HA-NS2 by immunoprecipitation using epitope-specific 
antibodies and by different electrophoretic mobilities of the various 
polypeptides (Fig. 3c). Mixing of Flag-NS2-3(H143A) with HA- 
NS2-3(C184A) resulted in cleavage of Flag-NS2-3, whereas HA- 
NS2-3 remained unprocessed (Fig. 3c, second lane from the right). 
When HA-NS2-3(H143A) was mixed with Flag-NS2-3(C184A), 
only HA-NS2-3 was cleaved (Fig. 3c, first lane from the right). 
Mixing wild-type Flag-NS2-3 with double-mutant HA-NS2- 
3(H143A/C184A) or vice versa yielded only cleaved wild-type NS2 
(Fig. 3d), whereas the double-mutant polypeptide remained unpro- 
cessed because neither of the composite active sites is functional when a 
wild-type and a double-mutant NS2-3 dimerize. Finally, when cells are 
co-transfected with wild-type Flag-NS2-3 and HA-NS2-3 and lysed 
with a mild detergent, Flag-NS2 and HA-NS2 can be co-precipitated 
using either an anti-Flag or an anti-HA antibody (Fig. 3e). These data 
strongly support the NS2 pro crystal structure and prove that NS2 can 
form dimers with composite functional active sites. 

Our data change the current view of HCV polyprotein processing 
and raise interesting regulatory possibilities. Previously, NS2-3 
cleavage was thought to occur as a unimolecular reaction in cis. 
The apparent requirement for dimerization to form an active NS2-3 
protease suggests that NS2-3 cleavage and therefore formation of the 
active RNA replicase maybe dependent on the concentration of NS2. 
Thus, a requirement for NS2-3 dimerization and subsequent pro- 
teolytic processing could delay the initiation of RNA replication, 
which may allow the virus to accumulate sufficient amounts of active 
NS3-4A protease to antagonize the induction of type 1 interferons by 
the host 21 . 

The cleaved NS2 dimer and higher-order oligomers may have 
additional roles in the viral life cycle, such as a function in mem- 
brane-associated virus assembly 22 . The solvent-accessible surface of 
the NS2 pro dimer, coloured according to electrostatic potential 
(Fig. 4a, b), showed a high proportion of neutral and basic regions. 
The surface of the molecule near helices HI and H2 is mainly 
hydrophobic, with basic residues lying underneath. The crystal 
structure of NS2 pro contained several molecules of rt-octyl-3-gluco- 
side and /7-decyl-(3-maltoside interacting with those two helices 
(Fig. 4d, e and Supplementary Fig. 2d). We propose a model in 
which NS2 pm interacts peripherally with cellular membranes 
(Fig. 4c, f). The N termini of two NS2 pro monomers would lie 



Figure 3 | Dimerization of NS2 and formation of a composite active site in 
mammalian cells, a, Mixing of two HCV polyproteins, each containing a 
point mutation in the NS2 active site (H143A or C184A; shown in blue and 
red, respectively) leads to the formation of NS2-3 mutant homo- and 
hetero-dimers. Dimerization between NS2(H143A) and NS2(C184A) yields 
one defective and one functional active site, resulting in partial NS2-3 
cleavage. The protease domain of NS3 is shown in green, b, Mixing 
experiment in Huh-7.5 cells of HCV full-length polyproteins containing a 
wild-type (WT), single- (H or C) or double-mutant (H/C) NS2 active site, 
followed by metabolic labelling, cell lysis and immunoprecipitation using 
anti-NS2 or anti-NS2-3 antibodies, c, Co-transfection of U20S cells with 
Flag- and HA-tagged NS2-3 constructs (consisting of residues 19-217 of NS2 
and 1-181 of NS3) containing a wild-type or mutant NS2 active site as 
indicated on top of the lanes, followed by metabolic labelling, cell lysis and 
immunoprecipitation using anti-NS2, anti-Flag or anti-HA antibodies. 

d, Mixing experiment of Flag-NS2-3 and HA-NS2-3 containing a 
wild-type or double-mutant NS2 active site, as described above. 

e, Co-immunoprecipitation using anti-Flag or anti-HA antibodies of 
co-transfected Flag-NS2-3 and HA-NS2-3, lysed either in SDS or CHAPS 
buffer. Positions of the NS2, NS3 and NS2-3 proteins are shown by 
arrowheads on the right. Molecular weight markers are indicated on the left. 
H, H143A; C, C184A; H/C, H143A/CI84A. 



© 2006 Nature Publishing Group 



833 



LETTERS 



NATUREJVol 442(17 August 2006 






-d- 

90° 



Figure 4 | Model for membrane association of NS2. a, Solvent -accessible 
surface of the NS2 dimer coloured according to electrostatic potential: acidic 
(red), neutral (white) and basic (blue), b, Electrostatic surface of the NS2 
dimer rotated 90° around the horizontal axis shown in a. c, Electrostatic 
surface of the NS2 dimer inserted peripherally into a cellular membrane, 
d, Ribbon diagram of the NS2 dimer oriented as in a, with three n-octyl-3- 
glucoside molecules and one n-decyl-3-maltoside molecule bound to the 
N-terminal subdomains. e, Ribbon diagram of the NS2 dimer oriented as in 





b, showing the four bound detergent molecules, f, Ribbon diagram of the 
NS2 dimer positioned relative to the membrane. In this model, the 
hydrophobic part of helix H2 from each subunit is peripherally inserted into 
the lipid bilayer, with basic amino acids on the side of the helix involved in 
neutralizing the charge of polar lipid head groups in the membrane. The 
N-terminal transmembrane segments of NS2 (shown as dotted Lines) would 
extend into the membrane. 



close to the membrane (Fig. 4f). In the full-length protein, dimer- 
ization of NS2 may cause the N-terminal transmembrane domains 
of two monomers to form a 'bundle 1 of transmembrane segments 
that may serve an important function during virion morphogenesis. 

The NS2 pro structure presented here will allow further studies to 
elucidate the role of NS2 dimerization and other functions of NS2 
in the viral life cycle. In addition, the structure establishes a 
foundation for the design of small-molecule inhibitors directed 
against the well-defined active site cleft and other conserved features 
of the protein. 

METHODS 

See Supplementary Information for detailed methods. 

Protein preparation. NS2 pro (NS2 residues 94-217), a truncated form of NS2 
shown to be proteolytically active in the context of an NS2-3 precursor that 
includes the NS3 protease domain 18,19 , was expressed in Escherichia coli. Lysis and 
subsequent purification of the protein by immobilized-metal affinity chroma- 
tography, ion exchange chromatography and gel filtration was done in the 
presence of detergent. The final concentration on a cation exchange column 
yielded highly pure protein at 6-9mgml -1 . 

Crystal growth and freezing. Crystals of NS2 pro were grown using hanging- 
drop vapour diffusion at 4°C. The well contained a solution of 0.1 M Tris, 
pH 8.5, 0.8 M ammonium acetate, 0.25 M lithium chloride and 12% PEG 3350. 
Crystals were frozen in well solution supplemented with 5.4 mM decyl maltoside 
using stepwise addition of glycerol to a final concentration of 25%. 
Data collection and structure determination. Data were collected at beamlines 
X9A and X29 at the Brookhaven National Laboratory's National Synchrotron 
Light Source. Phases were calculated from selenomethionine-containing protein 
by multiwavelength anomalous diffraction (MAD). After data indexing and 
scaling with DRNZO/SCAL BRACK 23 , 19 of the 24 selenium sites were found 
using SnB 24 . An interpretable electron density map was obtained using 
MLPHARE 2 -\ followed by density modification and phase combination using 
SOLOMON and DM 25 . Several rounds of iterative model building and refine- 
ment were done using O 2 " and CNS 27 . The 2.9 A resolution structure was used as 
a search model to obtain phases for a native data set at 2.28 A resolution using 
molecular replacement. The final model contained 176 solvent molecules, 12 
detergent molecules, and 12 molecules of NS2 prt \ A summary of the refinement 
statistics is shown in Supplementary Table 1. 

Expression in mammalian cells. NS2 containing wild-type or mutant active site 
residues was expressed in mammalian cells in the context of a full-length viral 
polyprotein or as NS2-3 precursor with an N-terminal Flag or HA tag. Cells 
mctabolically labelled with 35 S were lysed and immunoprecipitation of NS2 
performed using antibodies against NS2 or the Flag and HA tags. Proteins were 
separated by SDS-polyacrylamide gel electrophoresis and visualized by 
autoradiography. 
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The hepatitis C virus H strain (HCV-H) poiyprotein is cleaved to produce at least 10 distinct products, in 
the order of NH 2 -C-El-E2-p7-NS2-NS3-NS4A-NS4B-NS5A-NS5B-COOH. An HCV-encoded serine proteinase 
activity in NS3 is required for cleavage at four sites in the nonstructural region (3/4A, 4A/4B, 4B/5A, and 
5A/5B). In this report, the HCV-H serine proteinase domain (the N-tenninal 181 residues of NS3) was tested 
for its ability to mediate frans-processing at these four sites. By using an NS3-5B substrate with an inactivated 
serine proteinase domain, fra/u-cleavage was observed at all sites except for the 3/4A site. Deletion of the 
inactive proteinase domain led to efficient Ira/u-processing at the 3/4A site. Smaller NS4A-4B and NS5A-5B 
substrates were processed efficiently in trans\ however, cleavage of an NS4B-5A substrate occurred only when 
the serine proteinase domain was coexpressed with NS4A. Only the N-terminal 35 amino acids of NS4A were 
required for this activity. Thus, while NS4A appears to be absolutely required for fraru-cleavage at the 4B/5A 
site, it is not an essential cofactor for serine proteinase activity. To begin to examine the conservation (or 
divergence) of serine proteinase-substrate interactions during HCV evolution, we demonstrated that similar 
fra/v-processing occurred when the proteinase domains and substrates were derived from two different HCV 
subtypes. These results are encouraging for the development of broadly effective HCV serine proteinase 
inhibitors as antiviral agents. Finally, the kinetics of processing in the nonstructural region was examined by 
pulse-chase analysis. NS3-containing precursors were absent, indicating that the 2/3 and 3/4A cleavages occur 
rapidly. In contrast, processing of the NS4A-5B region appeared to involve multiple pathways, and significant 
quantities of various poiyprotein intermediates were observed. NS5B, the putative RNA polymerase, was found 
to be significantly less stable than the other mature cleavage products. This instability appeared to be an 
inherent property of NS5B and did not depend on expression of other viral polypeptides, including the 
HCV-encoded proteinases. 



Hepatitis C viruses (HCVs) have recently been recognized 
as agents of the parentally transmitted form of non-A, non-B 
hepatitis (17, 41). Virtual elimination of HCV-contaminated 
blood has greatly reduced the incidence of posttransfusion 
hepatitis; however, HCV remains responsible for a significant 
proportion of community-acquired hepatitis (1). In most cases, 
HCV is not cleared and establishes a chronic infection that can 
be associated with chronic hepatitis and more severe liver 
disease such as cirrhosis and hepatocellular carcinoma (63). 
For these reasons, there is considerable interest in developing 
additional HCV-specific antiviral agents that can complement 
currently available alpha interferon therapy, which effectively 
controls disease in only a minority of HCV-infected patients. 

At least 15 full-length HCV genome sequences, as well as 
partial sequences for many other isolates, have been reported 
(see reference 60 and citations therein). These data indicate 
the existence of multiple genotypes that can diverge by as much 
as 50% at the amino acid level (10, 64, 65). This group of 
related viruses is now classified as a separate genus in the 
family Flaviviridae (27), which includes two other genera, 
Flavivirus (12) and Pestivirus (20). The positive-strand HCV 
genome RNA is approximately 9.4 kb in length and contains a 
highly conserved 5' noncoding region followed by a long open 
reading frame encoding a poiyprotein of 3,010 to 3,033 amino 
acids (36, 51). Because a cell culture system supporting effi- 
cient HCV replication is lacking, efforts to define potential 
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HCV-encoded polypeptides have utilized expression of HCV 
cDNA in cell-free translation and in cell cultures. The HCV 
poiyprotein appears to be cleaved at multiple sites to produce 
at least 10 structural and nonstructural (NS) proteins (47). The 
order and nomenclature of these cleavage products for the 
HCV H strain (HCV-H) are NH 2 -C-El-E2-p7-NS2-NS3- 
NS4A-NS4B-NS5A-NS5B-COOH, where C, El, and E2 are 
putative structural proteins and the remaining NS proteins are 
believed to be replicase components (30-32, 47). Host signal 
peptidase in the endoplasmic reticulum lumen appears to 
catalyze cleavages in the structural-NS2 region (C/El, E1/E2, 
E2/p7, and p7/NS2 sites) (33, 47), whereas an HCV-encoded 
serine proteinase located in the N-terminal one-third of the 
NS3 protein is responsible for four cleavages in the NS region 
(3/4A, 4A/4B, 4B/5A, and 5A/5B sites) (5, 22, 30, 34, 50, 69). 
Autocatalytic cleavage at the 2/3 site is mediated by a second 
HCV-encoded proteinase that encompasses the NS2 region 
and the NS3 serine proteinase domain (31, 35), 

In this study, we tested the ability of the NS3 serine 
proteinase domain (called NS3 181 ) to mediate fra/w-processing 
at each of the four downstream sites. All four sites could be 
cleaved in trans; however, requirements for fra/w-cleavage 
varied for different sites, franj-cleavage at the 3/4A site was 
very inefficient, if there was any, when the substrate contained 
an inactivated serine proteinase. Coexpression of NS4A is 
required for cleavage at the 4B/5A site, but not at the 5A/5B 
site. We also tested the ability of the serine proteinases from 
two HCV subtypes (H and BK strains) to mediate trans- 
processing of heterologous HCV poiyprotein substrates. Fi- 
nally, we used a vaccinia vims recombinant expressing the 
entire HCV poiyprotein to examine the processing kinetics in 
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the NS region and the stability of HCV precursors and 
cleavage products. 

MATERIALS AND METHODS 

Cell cultures. The BHK-21 and CV-1 cell lines were ob- 
tained from the American 1\pe Culture Collection, and the 
BSC-40 cell line (9) was obtained from D. Hruby (Oregon 
State University). Cell monolayers were grown in Eagle's 
minimal essential medium (MEM) supplemented with 2 mM 
L-glutamine, nonessential amino acids, penicillin, streptomy- 
cin, and 10% fetal bovine serum (FBS). The A16 subclone of 
the human hepatoma HepG2 cell line, generously provided by 
Alan Schwartz (Washington University), was maintained in 
Dulbecco's modified Eagle medium supplemented with peni- 
cillin, streptomycin, and 10% FBS. 

Plasmid constructions. Standard recombinant DNA tech- 
niques (61) were used for construction of the expression 
plasmids described below. For all plasmids, regions of HCV-H 
coding sequence amplified by PCR were verified by DNA 
sequence analysis. 

Synthetic oligonucleotides and PCR were used to engineer 
initiation or termination codons as well as convenient restric- 
tion sites for subcloning (5' Ncol and 3' Xhol sites) for several 
HCV-H expression constructs. These constructs (with the 
encoded polyproteins given in parentheses) are as follows (Fig. 
1): pTM3/HCV1027-1657 (NS3), pBRTM/HCV1027-1711 
(NS3-4A), pTM3/HCV1658-1711 (NS4A), pTM3/HCV1658- 
1972 (NS4A-4B), pTM3/HCV1658-2420 (NS4A-5A), pTM3/ 
HCV1712-2420 (NS4B-5A), pBRTM/HCV17 12-3011 (NS4B- 
5B), pBRTM/HCV1973-3011 (NS5A-5B), and pTM3/HCV 
2421-3011 (Met-NS5B). The sequences encompassing the en- 
gineered initiation codons (boldface) are as follows (HCV-H 
sequence underlined): NS3, 5'-CCATGG£GC££-3'; NS4A, 
5 ' -CCATGGCC AGC ACC -3 ' ; NS4B, 5'-CCATGGCGIX2C 
AG-3'; NS5A, 5'-CCATGGGAICCGQC-3'; and NS5B, 5'- 
CCATGGG CTCAATG -3'. For the engineered termination 
codons (boldface), the surrounding sequences are as follows 
(HCV-H sequence underlined): NS3, 5 - GTCACG TGACTC 
GAG-3'; NS4A, 5 AGTGC TAGCTCG AG-3 ' ; NS4B, 5'- 
££AlIiCrAGCTCGAG-3'; and NS5A, 5'-IQClQCrAGC 
TCGAG-3'. 

pTM3/Ubiquitin-HCV2421-3011 (Ubi-NS5B) was con- 
structed by ligation of two PCR-derived fragments into pTM3/ 
HCV2421-3011 (Met-NS5B). The initiating methionine of the 
ubiquitin monomer corresponds to the ATG in theWcoI site of 
pTM3. The ubiquitin (double underlined)-NS5B (underlined) 
junction was created by using a BamHl restriction site (bold- 
face) as follows: CGC GOT GGA ICC ATG TCT . The 
template for PCR amplification of the ubiquitin cassette was 
pTM3/Ub-nsP4 (Tyr) (44). 

Additional HCV-H expression plasmids (with the encoded 
polyproteins given in parentheses) were constructed by sub- 
cloning appropriate fragments from previously described con- 
structs (Fig. 1). pTM3/HCV1027-1207 (NS3 181 ) was derived 
from pTM3/HCV1027-1657 (described above) and pTM3/ 
HCV827-1207 (31); P TM3/HCV 1027- 1676 (NS3-4A 19 ) was 
derived from pTM3/HCV1027-1657 and pBRTM/HCVl-1676 
(32); pTM3/HCV1027-1692 (NS3-4A 35 ) was derived from 
pTM3/HCV1027-1657 and pBRTM/HCVl-1692 (32); pBR 
TM/HCV1027-3011 S ll65 A (NS3-5B*) was derived from pBR 
TM/HCV1-3011 S U65 A (30) and pBRTM/HCV 1027- 1711; 
PTM3/HCV1193-1657 (NS3, 67 ^ 31 ) was derived from pBRTM/ 
HCV1193-3011 (30) and pTM3/HCV 1027- 1657; and pTM3/ 
HCV1193-1711 (NS3 167 -4A) was derived from pBRTM/HCV 
1193-3011, pBRTM/HCV1027-1711, and pTM3. pTM3/HCV 
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FIG. 1. HCV genome structure and expression constructs. (A) 
Diagram of the HCV-H strain polyprotein and its cleavage products 
shown as boxes. The identities of the mature proteins, including C, El, 
E2, p7, NS2, NS3, NS4A, NS4B, NS5A, and NS5B, are indicated (32, 
47). The number at the top of each cleavage product indicates the 
position of its N-terminal residue in the polyprotein sequence. The 
apparent molecular masses for HCV proteins (p) and glycoproteins 
(gp) are indicated under each product (in kilodaltons). Regions 
containing predominantly uncharged amino acids are indicated as 
black bars. Also shown are putative cleavage sites for host signal 
peptidase (♦) (33, 47), the HCV NS2-3 proteinase (0) (31, 34), and the 
NS3 serine proteinase (^) (5, 22, 30, 34, 50, 69). (B) HCV-H 
polypeptide expression constructs used in this study. HCV polypeptide 
sequences present in each pBRTM/HCV or pTM3/HCV construct are 
indicated by black lines, which axe drawn to scale and oriented with 
respect to the diagram of the HCV-H polyprotein. Numbers at the 
ends of each line refer to the first and last amino acids of the HCV 
polypeptide expressed by the particular construct. For simplicity, the 
NS prefix is not used for the nomenclature of each encoded polypep- 
tide, which is indicated on the left. (Q HCV-BK polypeptide expres- 
sion constructs. (See the legend to panel B for details.) 



1658-1676 (NS4A 19 ) was constructed by deleting the Hincll- 
Nhel fragment of pTM3/HCVl 658- 1711 (the Nhel site was 
filled in by using T4 DNA polymerase prior to ligation). 
pTM3/HCV1658-1692 (NS4A 35 ) was generated by deleting the 
NaeVNhel fragment of pTM3/HCV1658-1711 (the Nhel site 
was filled in by using T4 DNA polymerase prior to ligation). 
pTM3/HCV2269-2508 (NS5A 297 -5B 88 ) was made by subclon- 
ing the 1,274-bp Bsal-Bglll fragment from pTM3/HCVl-2508 
(32) into pTM3 digested with Ncol and flg/II (the Bsal and 
Ncol sites were filled in by T4 DNA polymerase prior to 
ligation). pTM3/HCV2285-2508 (NS5A 313 -5B 88 ) was con- 
structed by subcloning the 1,227-bp Apal-Bgill fragment of 
pTM3/HCVl-2508 (32) into pTM3, which had been previously 
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digested with Ncol and Bglll (thcApal and Nco\ cleavage sites 
were trimmed and filled in, respectively, by T4 DNA poly- 
merase prior to ligation). 

Expression constructs for the HCV BK strain (HCV-BK) 
were made with cDNA clones generously provided by H. 
Okayama and A, Takamizawa (67). pTM3/HCV-BK1027-1207 
[encoding polypeptide NS3 181 (BK)j was constructed by sub- 
cloning a PCR fragment amplified from pUC19/BK-146 (67) 
into pTM3, The sequences encompassing the engineered ini- 
tiation and termination codons (boldface) include (HCV-BK 
sequences underlined) an Ncol site at the N terminus (5'- 
CCATGQCTCCC-3') and a BamWl site at the C terminus 
(5 ' -CQQICrrAATAGG ATCC-3 ' ) . pBRTM/HCV-BK1221- 
3011 was produced by subcloning appropriate fragments from 
four HCV-BK cDNA clones, including pUC19/BK-102, BK- 
112-1, BK-112-5, and BK-166. Because the HCV-BK coding 
sequence in pUC19/BK-102 clones was fused in frame to the 
AUG codon in the Ncol site of the adaptor sequence, pBRTM/ 
HCV-BK1221-3011 encodes a polyprotein [NS3 195 -5B(BK)] 
encompassing HCV-BK residues 1221 to 3011 after the initi- 
ating methionine. 

Generation and growth of vaccinia virus-HCV recombi- 
nants. vHCV1027-1207 was generated by marker rescue of 
pTM3/HCV1027-1207 (49). Recombinant viruses were plaque 
purified three times under gpt selection (25) prior to growth of 
large-scale stocks. A vaccinia virus-HCV recombinant encod- 
ing the entire HCV-H open reading frame, vHCVl-3011, has 
been described previously (47). Stocks of vHCV1027-1207, 
vHCVl-3011, and vTF7-3, a vaccinia virus recombinant ex- 
pressing the T7 DNA-dependent RNA polymerase (28), were 
grown in BSC-40 monolayers and partially purified (37), and 
titers of infectious progeny were determined by plaque assay 
on BSC-40 cells (37). 

Transient expression with the vaccinia virus-T7 hybrid 
system. For expression assays utilizing vaccinia virus-HCV 
recombinants, monolayers of HepG2-A16 or BHK-21 cells in 
35-mm-diameter dishes were infected with vTF7-3 alone or in 
combination with vHCVl-3011, vHCV827-3011 (32), or 
VHCV1027-1207. The multiplicity of infection for each recom- 
binant was 10 PFU per cell. After adsorption for 60 min at 
room temperature, the inoculum was removed and replaced 
with MEM containing 2% FBS. Expression assays of trans- 
fected plasmid constructs utilized subconfluent monolayers of 
BHK-21 cells that had been previously infected with vTF7-3 as 
described above. Some of them were also coinfected with 
vHCV1027-1207. After removal of the inoculum, cells were 
transfected for 2 h at 37°C with a mixture consisting of 1 u,g of 
plasmid DNA and 10 u,g of Lipofectin (Bethesda Research 
Laboratories) in 0.5 ml of MEM. If two constructs were used 
in a single transfection, the amount of each plasmid varied 
from 0.5 u-g to 1 u,g, with a total of 1.5 p-g of DNA mixed with 
15 (ig of Lipofectin. 

For pulse-chase experiments, monolayers were washed once 
with prewarmed methionine-deficient MEM at 3 h postinfec- 
tion and incubated in the same medium for 20 min at 37°C. 
Cells were labeled by incubation for 20 min at 37°C with 
methionine-deficient MEM supplemented with 100 u.Ci of 
35 S-protein labeling mixture (NEN) per ml. For chase experi- 
ments, the labeling mixture was replaced with MEM contain- 
ing 2% FBS, 1.5 mg of methionine per ml, and 100 u,g of 
cycloheximide per ml and incubated for the indicated periods 
at 37°C. For steady-state labeling, cell monolayers were washed 
once at 3 h postinfection as described above and then were 
incubated for 4 h at 37°C with MEM containing l/40th the 
normal concentration of methionine and cysteine, 2% FBS, 
and 40 u.Ci of 35 S-protein labeling mixture per ml. 
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Cell lysis, immunoprecipitation, and protein analyses. After 
labeling, cell monolayers were washed with phosphate-buff- 
ered saline and lysed with a solution of 0.5% sodium dodecyl 
sulfate (SDS), 50 mM Tris-Cl (pH 7.4), 1 mM EDTA, and 20 
fjLg of phenylmethylsulfonyl fluoride per ml (0.3 ml/10 6 cells). 
Cellular DNA was sheared by repeated passage through a 
27.5-gauge needle. Prior to immunoprecipitation (47), lysates 
were heated to 70°C for 10 min. Portions of each Iysate were 
incubated either with 5 to 10 p.1 of the indicated rabbit 
polyclonal antisera or with 2 u,l of serum JHF from an 
HCV-positive patient (32). Immune complexes were collected 
by using Staphylococcus aureus Cowan I (Calbiochem) as 
described previously (59), solubilized, and analyzed by SDS- 
polyacrylamide gel electrophoresis (PAGE) (42) or Tricine- 
SDS-PAGE (62). After treatment for fluorography with 
En 3 Hance (DuPont), gels were dried and exposed at -70°C 
with prefogged (43) X-ray film (Kodak). 14 C-methylated mo- 
lecular weight marker proteins were purchased from Amer- 
sham. 

Cell-free translation. The 5 '-uncapped RNA transcripts 
were synthesized from linearized cDNA templates with T7 
DNA-dependent RNA polymerase (Epicenter) (58). Cell-free 
translation mixtures with rabbit reticulocyte lysates (Promega) 
and [ 35 S]raethionine (Amersham), were incubated for 1 h at 
30°C essentially according to the manufacturer's instructions. 
The translation reactions were terminated by the addition of 
RNase A (Boehringer Mannheim) to 10 u,g/ml, cycloheximide 
to 0.3 mg/ml, and cold methionine to 1 mM. A portion of the 
translation reaction mixtures was removed at the indicated 
time, diluted 10-fold with the Laemmli sample buffer, heated 
for 5 min at 95°C, and analyzed by SDS-PAGE as described 
above. 

RESULTS 

fra/w-Cleavage at all four serine proteinase-dependent sites. 
The serine proteinase domain of HCVs was initially identified 
on the basis of sequence homology to members of the trypsin 
superfamily (7, 29). The predicted domain is approximately 
180 residues and corresponds to the N-terminal one-third of 
NS3. This enzyme is required for processing in the NS3-4-5 
region of the HCV polyprotein, and alanine substitutions for 
predicted active site residues (His-1083 or Ser-1165 for 
HCV-H) abolish cleavage at the 3/4A, 4A74B, 4B/5A, and 
5A/5B sites (5, 22, 30, 34, 50, 69). To purify and characterize 
this enzyme, we have used the vaccinia virus-T7 hybrid expres- 
sion system to examine the ability of the predicted serine 
proteinase domain, expressed as an individual polypeptide 
(NS3 181 ), to mediate ftww-cleavage at each of these four sites. 

The first substrate examined was an NS3-5B polyprotein 
containing the Ala substitution at Ser-1165 (NS3-5B*) (Fig. 1). 
This mutation completely inactivates the serine proteinase, 
and no processed products were observed (Fig. 2). When 
coexpressed with NS3i 81 , cleavage occurred at the 4A/4B, 
4B/5A, and 5A/5B sites, as evidenced by the appearance of 
NS4B (Fig. 2B), NS5A (Fig. 2C), and NS5B (Fig. 2D). In 
contrast, we observed a more-slowly-migrating NS3-specific 
product, presumably NS3-4A, in addition to a very faint band 
corresponding to NS3 (Fig. 2A). This suggests that very 
inefficient, if any, /raru-cleavage occurred at the 3/4A site of 
this substrate. 

The lack of rrartf -cleavage at the 3/4A site has been observed 
in other studies and has led to the proposal that this site can 
only be cleaved in cis (5, 69). However, all substrates examined 
thus far contained an inactivated NS3 serine proteinase do- 
main, which might interfere with the accessibility of the 3/4A 
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FIG. 2. frans-Processing of the HCV-H NS3-5B* polyprotein. 
BHK-21 cell monolayers were infected with vTF7-3 alone (m) or in 
combination with vHCV827-3011 (v) or vHCV1027-1207 (3 l81 ). Some 
monolayers were also transfected with pBRTM/HCVl 027-30 11 S n65 A 
(3-5B*). Cells were metabolically labeled with 35 S-protein labeling 
mixture as described in Materials and Methods. Cell lysates were 
immunoprecipitated with the following HCV-specific antisera: NS3- 
specific WU117 (A), NS5A-specific WU123 (C), NS5B-specific 
WUH5 (D), or human patient serum JHF (B). It should be noted that 
the NS3 serine proteinase domain is not recognized by either human 
patient serum JHF or rabbit antiserum WU117, which was raised 
against the NS3 helicase domain. Immunoprecipitated proteins were 
solubilized and separated by electrophoresis on 8% (A, C, and D) or 
14% (B) polyacrylamide-SDS gels. HCV-specific proteins are indi- 
cated on the right, and the sizes of l4 C-labeled protein molecular mass 
markers (in kilodaltons) are indicated on the left. 



site for franj-cleavage. To test this possibility, we expressed a 
polyprotein, NS3 167 -5B, which begins with residue 167 of NS3 
and therefore lacks the majority of the serine proteinase 
domain. Marker proteins were also expressed beginning with 
NS3 residue 167 and extending to the C terminus of NS3 
(NS3 167j631 ) or NS4A (NS3 167 -4A) (Fig. 1). Processed prod- 
ucts were not observed when NS3 167 -5B was expressed alone 
(Fig. 3). During coexpression with NS3 181 , two NS3-specific 
cleavage products were observed: a major product comigrating 
with NS3 167 ^ 31 and traces of a larger species comigrating with 
NS3 167 -4A (Fig. 3). These results clearly demonstrate that 
NS3 181 can mediate efficient franj-cleavage at the 3/4A site of 
a substrate which lacks the inactivated proteinase domain. 

In contrast to the flavivimses, where NS2B is absolutely 
required for NS3 serine proteinase activity (11, 24, 57), HCV 
sequences upstream of NS3 are not required for serine pro- 
teinase-dependent cleavages (5, 22, 30). However, the poten- 
tial role of downstream viral polypeptide sequences in prote- 
olysis has not been examined. To address this possibility, we 
tested fram-cleavage of NS4A-4B, NS4B-5A, and NS5A-5B 
substrates, each of which contained only a single proteinase- 
dependent cleavage site (Fig. 1). When expressed alone, only 
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FIG. 3. Requirements for fro/u-cleavage at the 3/4A site. BHK-21 
cell monolayers were infected with vTF7-3 alone (m) or in combina- 
tion with vHCV827-3011 (v) or vHCV1027-1207 (3 181 ). Some mono- 
layers were also transfected with pBRTM/HCVl 193-3011 (3 l67 -5B), 
pTM3/HCV1193-1657 (3,-7.0,), or pTM3/HCV1193-1711 (3 167 -4A). 
Cells were labeled with "S-protein labeling mixture as described in 
Materials and Methods. HCV NS3-specific products were immunopre- 
cipitated with rabbit antiserum WU117, solubilized, and analyzed by 
SDS-PAGE (8% polyacrylamide). HCV-specific proteins are indicated 
on the right, and the sizes of 14 C-labeled protein molecular mass 
markers (in kilodaltons) are indicated on the left. 



the appropriate unprocessed polyproteins were present (Fig. 
4). When coexpressed with NS3 18l , NS4A-4B was processed to 
yield NS4A and NS4B (Fig. 4A), and NS5A-5B yielded NS5A 
and NS5B (Fig. 4C). To develop shorter substrates convenient 
for in vitro proteinase assays, we examined fram , -processing of 
NS5A 297 -5B 88 and NS5A 313 -5B 88 , which contain the C-termi- 
nal 152 and 136 residues of NS5A, respectively, followed by the 
N-terminal 88 amino acids of NS5B (Fig. 1). NSS^-SBgg was 
processed efficiently by NS3 181 as evidenced by the conversion 
of most of NS5A 297 -5B 88 to NS5A 297 _44 8 . Nearly complete 
frans-cleavage at the 5A/5B site was also observed for 
NS5A 313 -5B 88 (Fig. 4D). These results indicate that only 
limited flanking sequences are necessary for efficient trans- 
cleavage at the 5A/5B site by the NS3i 81 serine proteinase. 
Since these substrates do not overlap, these data exclude an 
absolute requirement for one of the downstream viral polypep- 
tides for serine proteinase activity. In contrast to the results 
with the NS4A-4B, NS5A-5B, and NS3-5B* substrates, how- 
ever, no frartj-cleavage of NS4B-5A was observed (Fig. 4B). 

NS4A is required for cleavage at the 4B/5A site. Since 
fra/w-cleavage at the 4B/5A site occurred for the NS3-5B* 
substrate but not for NS4B-5A, we examined tamj-cleavage of 
NS4A-5A and NS4B-5B polyprotein substrates (Fig. 1). When 
coexpressed with NS3 181 , the 4B/5A cleavage occurred in the 
NS4A-5A substrate (Fig. 5A, lane 4) but not in NS4B-5B (data 
not shown). These results suggested that, in addition to the 
NS3 18 , proteinase domain, NS4A was required for cleavage at 
the 4B/5A site. The requirement for NS4A was strengthened 
by the observation that processing of NS4B-5A was restored by 
coexpression of NS4A in trans. Cleavage at the 4B/5A site of 
this substrate occurred when NS4A was expressed either as 
part of the proteinase (NS3-4A) (Fig. 5A, lane 9) or as an 
individual polypeptide (NS4A) together with NS3, 81 (Fig. 5 A, 
lane 7). These results clearly demonstrate that NS3-mediated 
cleavage at the 4B/5A site requires NS4A, a small protein of 54 
amino acids with a hydrophobic N-terminal half and a C- 
terminal half rich in charged residues (see Discussion). 

Since NS3-4A was fully active for /ra/w-cleavage at the 
4B/5A site, we made two constructs with C-terminal deletions 
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FIG. 4. fra/u-Processing of HCV-H polyproteins containing only 
one serine' proteinase-dependent site. BHK-21 cell monolayers were 
infected with vTF7-3 alone (m) or in combination with vHCV827-301 1 
(v) or vHCV1027-1207 (3 l81 ). As indicated, some monolayers were 
also transfected with pTM3/HCVl 658- 1972 (4A-4B), pTM3/ 
HCV1712-2420 (4B-5A), pBRTM/HCV 1973-301 1 (5A-5B), pTM3/ 
HCV2269-2508 (SA^-SB^,), or pTM3/HCV2285-2508 (SA^-SBes). 
These BHK-21 cells were labeled with 35 S-protein labeling mixture as 
described in Materials and Methods. Cell rysates were immunoprecipi- 
tated with human patient serum JHF (A) or the following HCV- 
specific rabbit antisera: NS5A-specific WU123 (B and C), NS5B- 
specific WU115 (C), and WU113 specific for both NSSA and NS5B 
(D). Apparently, rabbit antiserum WU113, which was raised against a 
fusion protein containing the C-terminal 109 residues of NSSA and the 
N-terminal 203 residues of NS5B, recognizes only the NSSA region but 
not the NS5B sequences in SA^-SBgg and 5A 313 -5B g8 . Immunopre- 
cipitated proteins were solubilized and separated by electrophoresis on 
14% (A and D) or 8% (B and C) polyacrylamide-SDS gels. HCV- 
specific proteins are indicated on the right, and the sizes of * 4 C-labeled 
protein molecular mass markers (in kilodaltons) are indicated on the 
left. In panel A, NS4A is difficult to visualize because it contains only 
a single methionine residue (compared with six in NS4B) and migrates 
as a diffuse band on this gel system. 



in the NS4A region to map NS4A sequences required for this 
activity. NS3-4A 35 and NS3-4A 19 contain the full-length NS3 
followed by the N-terminal 35 and 19 residues of NS4A, 
respectively (Fig. 1). As evidenced by production of NSSA, 
NS3-4A 35 (Fig. 5A, lane 10), but not NS3-4Aj 9 (lane 11), was 
able to process NS4B-5A. In an earlier study, similar constructs 
were generated to map the location of NS4A (32). A polypro- 
tein beginning with the C protein and extending through the 
N-terrninal 35 residues of NS4A was efficiently processed at 
the 3/4A site. However, a C-terminal truncation to residue 19 
of NS4A appeared to block cleavage at the 3/4 A site (32). 
Thus, the inability of NS3-4A 19 to function for fra/w-cleavage 
of NS4B-5A might result from lack of cleavage at the 3/4A site 
and release of the NS4A N terminus rather than deletion of 
NS4A residues 20 to 35. To address this possibility, we 
examined the activity of polypeptides encompassing the N- 
terminal 19 and 35 residues of NS4A (called NS4A 19 and 
NS4A 35 , respectively). NS4A 35 , but not NS4A 19 , was able to 
induce fra/u-cleavage of NS4B-5A by NS3 l81 (Fig. SB). These 
results indicate that the C-terminal 19 amino acids (residues 36 
to 54) of NS4A, which contain 8 to 9 highly conserved, charged 
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FIG. 5. Requirements for /m/w-cleavage at the 4B/5Asite. BHK-21 
cell monolayers were infected with vTF7-3 alone (m) or in combina- 
tion with vHCV827-3011 (v) or vHCV1027-1207 (3 l81 ). As indicated, 
some monolayers were also transfected with the following plasmids: 
pTM3/HCVl 658-2420 (4A-5A), pTM3/HCVl 712-2420 (4B-5A), pTM 
3/HCV1658-1711 (4A), pTM3/HCV 1027-1 657 (3), pTM3/HCV1027- 
1676 (3-4A l9 ), pTM3/HCV1027-1692 (3-4A 33 ), pBRTM/HCV1027- 
1711 (3-4A), pTM3/HCV1658-1676 (4A, 9 ) t and pTM3/HCV1658-1692 
( 4 A 35 ), Cells were labeled with 35 S-protein labeling mbcture as de- 
scribed in Materials and Methods. HCV-specific products were immu- 
noprecipitated with NSSA-specific antiserum WU123 (A) or human 
patient serum JHF (B), solubilized, and separated by 8% (A) or 10% 
(B) polyacrylamide-SDS gels. HCV-specific proteins are indicated on 
the right, and the sizes of !4 C-labeIed protein molecular mass markers 
(in kilodaltons) are indicated on the left. 



residues (see Discussion), are not required for rranj-cleavage 
at the 4B/5A site. 

Cleavage at the 3/4A and 4A/4B sites, which flank NS4A, 
may also require NS4A sequences for efficient cleavage (see 
Discussion). However, since the 5A/5B site can be efficiently 
cleaved in the absence of NS4A (Fig. 4C and D), this protein 
is not absolutely required for NS3 serine proteinase activity. 
For development of an in vitro proteinase assay that does not 
require NS4A, substrates containing the 5A/5B site should be 
good candidates. 

fra/u-Cleavage between HCV-H and HCV-BK strains. Viral 
proteinases, which are important for polyprotein processing 
and viral replication, present attractive targets for develop- 
ment of antiviral therapeutic agents. Since sequence analysis of 
HCV isolates has uncovered considerable genetic diversity, the 
success of a proteinase inhibitor strategy will depend at least in 
part on the conservation of proteinase-substrate interactions 
among different HCV types. In one classification scheme, six 
major genotypes or types (from 1 to 6) are distinguished, with 
some types further divided into related subtypes (64, 65). 
HCV-H (26) and HCV-BK (67) are members of the la and lb 
subtypes, respectively, which represent the major subtypes in 
the United States and Japan. These two strains share 90 and 
87% amino acid sequence identities in the serine proteinase 
domain and in the NS3-5B region, respectively. To examine 
the conservation or divergence of proteinase-substrate inter- 
actions among different HCV strains, we compared the ability 
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FIG. 6. fra/w-Cleavage between HCV-H and HCV-BK polypep- 
tides. BHK-21 cell monolayers were infected with vTF7-3 alone (m) or 
in combination with vHCV827-3011 (v). For frc/u-cleavage experi- 
ments, the polyprotein substrates were pBRTM/HCV 1193-3011 for 
the H strain (H) or pBRTM/HCV-BK1221-3010 for the BK strain 
(BK). The serine proteinase domains of both strains were expressed as 
the source of the proteolytic activities: VHCV1027-1207 for the H 
strain and pTM3/HCV-BKl 027- 1207 for the BK strain. The absence 
(— ) of certain expression constructs is indicated. Cells were labeled 
with 35 S-protein labeling mixture as described in Materials and Meth- 
ods. HCV-specific products were immunoprecipitated with the follow- 
ing antisera: NS3-specific WU117 (A), NS5A-specific WU123 (C), 
NS5B-specific WU115 (D), or human patient serum JHF (B). The 
immunoprecipitated proteins were solubilized and separated on 8% 
(A, C, and D) or 14% (B) polyacrylamide-SDS gels. HCV-specific 
proteins are indicated on the right, and the sizes of ™C-labeled protein 
molecular mass markers (in kilodattons) are indicated on the left. 
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FIG. 7. Pulse-chase analysis of processing in the NS3-4-5 regions. 
HepG2-A16 cells were infected with vTF7-3 alone (m) or coinfected 
with vTF7-3 and vHCVl-3011 (v), pulse labeled with 35 S-protein 
labeling mixture for 20 min, and chased for the indicated time as 
described in Materials and Methods. Cell lysates were prepared and 
immunoprecipitated with the following antisera: NS3-specific WU117 
(A), NS5A-specific WU123, or NS5B-specific WU115 (C) or human 
patient serum JHF (B). Immunoprecipitated proteins were solubilized 
and separated by SDS-PAGE (8% polyacrylamide) (A and C) or 
Tricine-SDS-PAGE (14% polyacrylamide) (B). The migration pattern 
of HCV NS5A-specific polyprotein markers is shown on the left in 
panel C BHK-21 cells previously infected with vTF7-3 were mock 
transfected (m) or transfected with the indicated plasmids and labeled 
with 3S S-protein labeling mixture as described in Materials and Meth- 
ods. HCV-specific proteins are identified on the right, and the sizes of 
M C-labeled protein molecular mass markers (in kilodaltons) are 
indicated on the left. 



of the NS3 serine proteinases of the HCV-H or the HCV-BK 
strain to mediate fra/w-cleavage of homologous or heterolo- 
gous polyprotein substrates. 

For the H strain, we used the NS3 l8 , proteinase and the 
NS3 167 -5B substrate described above (Fig. 1). For the 
HCV-BK proteinase, we made a similar construct expressing 
the N-terminal 181 amino acids of HCV-BK NS3 [NS3 181 
(BK)]. The HCV-BK substrate was a polyprotein beginning 
with residue 195 of NS3 and extending through NS5B [NS3j 95 - 
5B(BK)]. When NS3 167 -5B was coexpressed with NS3 I8I , 
processing at all four sites occurred, as evidenced by the 
appearance of NS3 167 ^ 3 j, NS4A, NS4B, NS5A, and NSSB 
(Fig. 6). For the BK strain, NS3 181 (BK) was able to mediate 
fra/is-cleavage at the 3/4 A, 4A/4B, and 4B/5A sites of NS3, 95 - 
5B(BK), as indicated by the production of NS3, 95 _ 631 (BK), 
NS4A, and NS4B (Fig. 6A and B). Thus far, we have been 
unable to identify the HCV-BK NS5A and NSSB cleavage 
products by using HCV-H NS5A- or NSSB-specific rabbit 
antisera or HCV-positive patient antisera collected in the 
United States. As shown in Fig. 6, the serine proteinase 
domain of either strain was fully active at mediating trans- 
cleavage of the heterologous substrate from the other strain. 
NS3, 81 (BK) cleaved NS3 167 -5B of H strain, to NS3 167 ^ 31 , 
NS4A, NS4B, NS5A, and NSSB (Fig. 6). Likewise, NS3 195 - 



5B(BK) was processed by NS3 181 of the H strain to produce 
NS3 195 ^ 31 (BK), NS4A, and NS4B (Fig. 6A and B). Thus, at 
least as assessed by this franj-processing assay, these two 
different HCV subtypes do not appear to have diverged 
significantly in terms of serine proteinase-substrate recogni- 
tion. 

Kinetics of processing in the HCV NS region. Besides 
defining the minimal domains required for serine proteinase 
activity, it is also of interest to understand the processing 
reactions that occur in the full-length HCV polyprotein. In 
other viral systems, polyprotein cleavages that occur in cis 
versus those occurring in trans can be important for regulating 
RNA replicase function. Such regulation is possible when 
poiyproteins, processing intermediates, and mature cleavage 
products have distinct roles in replication (44). To begin to 
examine processing pathways and kinetics in the NS3-4-5 
region, pulse-chase experiments were carried out in HepG2- 
A16 cells by using a vaccinia virus-HCV recombinant, vHCVl- 
3011, which expresses the entire HCV-H polyprotein (47). As 
shown in Fig. 7A, NS3 was readily visible after a 20-min pulse 
and was not associated with any higher-molecular-mass 
polyprotein precursors, indicating that cleavage at both the 2/3 
and 3/4A sites occurs very rapidly, possibly in cis. 

In contrast, processing in the NS4-5 region was generally 
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slower and a number of processing intermediates were readily 
identified. As shown in Fig. 7B, NS4B was readily visible after 
a 20-min pulse. A 29-kDa protein comigrating with the product 
expressed from pTM3/HCV1658- 1972 (NS4A-4B) was identi- 
fied as the NS4A-4B polyprotein (Fig. 7B). A decrease in the 
level of NS4A-4B was accompanied by an increase in the 
amount of NS4B, which suggests that NS4A-4B can be a 
precursor for NS4B and NS4A. NS4A was not observed in this 
experiment, probably because of its low methionine content 
(only one) and inefficient expression by vHCVl-3011 (32). 
Four predominant NS5A-containing polyproteins of 160, 135, 
87, and 82 kDa were observed after the 20-min pulse (Fig. 7C). 
NS4-specific antiserum recognized all of these species except 
for the 135-kDa polyprotein (data riot shown), whereas the 
NS5B-specific antiserum recognized only the 160- and 135-kDa 
polyproteins (Fig. 7C). On the basis of their apparent molec- 
ular mass, immunoreactivity, and comigration with marker 
polyproteins (Fig. 7C), these four polyproteins were tentatively 
identified as NS4-5B (160 kDa), NS5A-5B (135 kDa), 
NS4A-5A (87 kDa), and NS4B-5A (82 kDa). It is unclear 
whether the 160-kDa polyprotein NS4-5B begins with NS4A or 
NS4B or is a mixture of both of these species. The presence of 
these four polyproteins suggests that there are several alterna- 
tive pathways for processing the NS4-5 region (see Discussion 
for more details). Over a 60-min chase, the level of NS5A (58 
kDa) increased significantly and was accompanied by a de- 
crease in the levels of NS5A-5B and NS4-5B (Fig. 7C), 
suggesting that these two polyproteins may be the precursors 
to NS5A. Because the levels of NS4B-5A and NS4A-5A 
increased initially, and then decreased during the chase period 
(Fig. 7C), they probably represent processing intermediates 
between NS4-5B and NS5A. An NS5A-specific protein of 62 
kDa (indicated as b in Fig. 7C), barely detectable after 15 min 
of chase, became more apparent after 60 min. In a previous 
study, several minor NS5A-specific species with slower mobil- 
ity were observed in addition to the dominant 58-kDa NS5A 
protein (32). Two additional faint bands of 107 and 47 kDa 
(labeled a and c, respectively, in Fig. 7C) were observed with 
the NS5B-speciflc antiserum. Product a was also recognized by 
NS5A-specific antiserum. These two proteins remain to be 
denned, but they may reflect additional proteolytic processing 
within the NS5B region. Although NS4-5B and NS5A-5B were 
likely precursors to NS5B, the level of NS5B did not change 
significantly over a 60-min chase period (Fig. 7C), and this 
protein appeared to be unstable relative to most of the other 
HCV-encoded proteins (see below). On the other hand, NS3 
(Fig. 7A) and NS4B (Fig. 7B) were stable up to 2 h, while a 
slight decrease in the level of NS5A was observed (Fig. 7C). 

Instability of the NS58 protein. While NS3 was very stable 
during prolonged chase periods, NS5A and, in particular, 
NS5B, the putative HCV RNA polymerase, appeared to be 
rather unstable. NS5A disappeared with a half-life of approx- 
imately 170 min, and NS5B disappeared with a half-life of 
about 70 min (data not shown). This observation is potentially 
interesting because some positive-strand viruses tightly regu- 
late the level of their RNA-dependent RNA polymerase. 
Additionally, the p75 protein of bovine viral diarrhea virus, the 
HCV NS5B homolog, is unstable in bovine viral diarrhea 
virus-infected cells (21). To determine whether other HCV- 
encoded proteins might be responsible for the instability of 
NS5B, we expressed two different forms of HCV NS5B. One 
form (Met-NS5B) included the entire NS5B region preceded 
by two non-HCV residues, Met-Gry. A second construct 
encoded a ubiquitin fusion protein consisting of the 76-residue 
ubiquitin monomer fused in frame to the N terminus of NS5B 
(Ubi-NS5B). Cleavage of this ubiquitin fusion protein by 
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FIG. 8. Stability of NS5B expressed in BHK-21 cells or by cell-free 
translation. (A) BHK-21 cells previously infected with vTF7-3 were 
mock transfected (m) or transfected with one of the following plas- 
raids: pTM3/HCV2421-3011 (Met-5B) or pTM3/Ubiquilin-HCV2421- 
3011 (Ubi-5B). The cell monolayers were pulse-labeled with 35 S- 
protein labeling mixture for 20 min and chased for the indicated times 
as described in Materials and Methods. Cell lysates were prepared, and 
HCV NS5B-specific products were immunoprecipitated by the rabbit 
antiserum WU115. Immunoprecipitated proteins were solubilized and 
separated by SDS-PAGE (8% polyacrylamide). HCV-speciflc proteins 
are identified on the right, and the sizes of 14 C-labeled protein 
molecular mass markers (in kilodaltons) from Amersham are indicated 
on the left. (B) Translations with RNA transcripts from pTM3/ 
HCV2421-3011 or pTM3/Ubiquitin-HCV2421-3011 or without any 
transcript (m) were incubated for 60 min at 30°C in reticulocyte lysate 
in the presence of [ 35 S]methionine. Translation reactions were termi- 
nated by the addition of RNase A, cycloheximide, and excess cold 
methionine and then were chased for the indicated times. The 
translation products were solubilized and analyzed by SDS-PAGE 
(14% polyacrylamide). The identities of proteins are shown on the 
right, and the sizes of H C-labeled protein molecular mass markers (in 
kilodaltons) are indicated on the left. 



cellular ubiquitin carboxy-terminal hydrolase should produce 
NS5B with its authentic N- terminal Ser residue (4). As shown 
in Fig. 8A, the Ubi-NS5B fusion protein was completely 
processed to NS5B after a 20-min pulse of transfected BHK-21 
cells. Both forms of the NS5B proteins were unstable, as 
evidenced by the rapid decline in the level of NS5B (Fig. 8A). 
The approximate half-lives were 90 min for Met-NS5B and 70 
min for NS5B produced by cleavage of Ubi-NS5B. Coexpres- 
sion of NS3 l81 had no significant effect on the stability of NS5B 
(data not shown). These results indicate that the instability of 
NS5B is not due to the presence of other HCV-encoded 
proteinases or proteins. Rather, NS5B is inherently unstable 
and is probably degraded through a cellular pathway. 

In an attempt to devise an in vitro assay to study NS5B 
degradation, we examined the stability of Met-NS5B or Ubi- 
NS5B produced by cell-free translation of RNA transcripts in 
rabbit reticulocyte lysates (Fig. 8B). Translation reactions were 
terminated by addition of RNase A, cycloheximide, and excess 
cold methionine and were chased for the indicated periods. 
Control experiments showed no further incorporation of 
[ 35 S]methionine after the addition of these three reagents 
(data not shown). Ubi-NS5B fusion proteins were completely 
processed to NS5B and ubiquitin (predicted molecular mass of 
8.6 kDa) after a 60-min incubation. Although slight decreases 



8154 



LIN ET AL. 



J. Virol. 



la HCV-H STWVLVOGVL AAIiAAYCLST Oj 

HCV-1 

HC-J1 

lb HCV-J T- - 

HCV-JT ---------- -T 

HCV-BK T- -] 

HCV-T T- - 

HCV-JK1 T- - 

lc HC-Q9 

2a HC-J6 A V A- - 

2b HC-J8 -S A V A- - 

3a NZL1 L V - 



.../. - - i, 

* > VA* v i ( ' ' A * ' •i * s - i c 1 >** " 

i r * * . #i- . * ■„=* *. ■* ■ ' tx- w" 

Is * ' < - ^ • ' ' 

: V"V. *\' : : r-- V"' 



i» SI. .-*■.»■ S'-i .... * ■ ■«"* ■« - 



AIZPD REVXYQEFDE HEEC 

R 

H 

-V 

-W R 

-W 

[vvA_- R EA- 

VVA-- K-I — EA 

-LV — K QY 



FIG. 9. Alignment of NS4A sequences. The predicted NS4A amino acid sequences are aligned for selected HCV isolates from six subtypes 
(indicated on the left): HCV-H (38), HCV-1 (18), HC-J1 (accession no. D10749), HCV-J (39), HCV-JT (68), HCV-BK (67), HCV-T (16), 
HCV-JK1 (accession no. S18O30), HC-G9 (53), HC-J6 (55), HC-J8 (54), and HCV-NZL1 (60). The single-letter code for amino acids is used. 
Hyphens indicate residues identical to those of the HCV-H strain sequence. The 14-residue segment (residues 22 to 35) implicated in 
mitts-cleavage at the 4B/5A site is shaded. Accession numbers for unpublished sequences are given above in parentheses. 



in the levels of the proteins were apparent over the 2-h chase, 
both Met-NS5B and NS5B produced by cleavage of Ubi-NS5B 
were quite stable in the reticulocyte lysates, making it difficult 
to assess the role of the ubiquitin-mediated degradation path- 
way (which is present in reticulocyte lysates [19]) in the 
turnover of NS5B. 



DISCUSSION 

It has been previously shown that an active NS3 serine 
proteinase is required for processing at four cleavage sites in 
the HCV NS3-4-5 region. The results presented here clearly 
demonstrate that the proteinase domain, expressed as a 181- 
residue N-terminal fragment of NS3, is able to mediate 
fra/w-cleavage at all four sites. Bartenschlager et al. (6) recently 
reported similar results showing that a fragment of the 
polyprotein, including the 212 N-terminal residues of NS3 and 
a 20-residue extension into the NS2 region, could also mediate 
frarw-cleavage at the 4A/4B, 4B/5A, and 5A/5B sites. Our 
results, as well as those of two recent studies (6, 23), indicate 
that NS4A is absolutely required for the 4B/5A cleavage. Failla 
et al. (23) also showed that NS4A of HCV-BK, supplied in 
trans, was required for cleavage at the 3/4A and 4B/5A sites 
and improves the efficiency of processing at the 4A/4B and 
5A/5B sites. On the basis of these results, it was suggested that 
NS4A functions as a general effector or cofactor for NS3 serine 
proteinase-mediated cleavage in the NS3-4-5 region. Virus- 
encoded cofactors required for serine proteinase activity have 
also been found for other members of the family Flaviviridae. 
The most dramatic example is the NS2B protein of flaviviruses, 
which is absolutely required for NS3 serine proteinase-medi- 
ated cleavage at all structural and nonstructural dibasic sites 
(2, 11, 15, 24, 45, 48, 56, 57, 70, 72). As discussed by Failla et 
al. (23), the pestivirus plO protein may be the functional 
homolog of HCV NS4A, because sequences in this region of 
the pestivirus polyprotein appear to be required for the serine 
proteinase-dependent cleavage between p58 and p75 (the two 
C-terminal products of the pestivirus polyprotein possibly 
equivalent to HCV NS5A and NS5B, respectively) (71). For 
HCV, NS4A is required for only three cleavages mediated by 
the serine proteinase (3/4A, 4A/4B, and 4B/5A). While Failla 
et al. showed that NS4A can increase rra/u-cleavage efficiency 
at the 5A/5B site (23), we found that certain substrates 
containing this site could be processed efficiently in the ab- 
sence of NS4A. Hence, the HCV serine proteinase-dependent 
cleavages can be separated into at least two types: (i) cleavages 



at the 3/4A, 4A/4B, and 4B/5A sites, which are located adjacent 
to hydrophobic sequences and require NS4A as a cofactor; and 
(ii) cleavage at the 5A/5B site, which can occur in the absence 
of NS4A. 

Although the mechanism(s) by which NS4A functions in 
proteolytic processing at type 1 sites remains to be determined, 
several possibilities can be envisioned, (i) NS4A may act as a 
molecular chaperone to facilitate folding of the serine protein- 
ase domain into an active enzyme. If the active form of the 
proteinase is the same for cleavage at both type 1 and type 2 
sites, then this model implies that type 1 substrates are 
suboptimal and require higher concentrations of active pro- 
teinase for efficient /ra/ty-cleavage, (ii) NS4A may bind to type 
1 substrates, the proteinase domain, or both to facilitate 
proteinase-substrate interactions and cleavage, (iii) NS4A may 
facilitate proteolysis of membrane-associated type 1 substrates 
by interacting with the proteinase domain and localizing it to 
the membrane compartment. Given that NS4A is required for 
cleavage at three different sites, it is tempting to propose that 
it functions via direct interaction with the proteinase domain. 
Thus far, unlike the flavivirus proteinase, which consists of a 
stable complex of NS2B and NS3 (3, 14), there is no direct 
evidence for association between the HCV NS3 and NS4A 
proteins. Suggestive evidence has been obtained, however, 
from in vitro studies in which NS3 was found to become 
membrane associated when the cell-free translation product 
included the NS4A region (35). 

Although NS4A is only 54 residues in length, we showed that 
a fragment of only 35 N-terminal residues, coexpressed with 
the serine proteinase domain, was sufficient for rra/is-cleavage 
at the 4B/5A site. Failla et al. (23) reported that a polypeptide 
consisting of the C-terminal 33 residues of NS4A and NS4B 
facilitated frans-cleavage at the 4B/5A site. Although flanking 
sequences may contribute to NS4A activity, these data suggest 
that a 14-residue segment (residues 22 to 35) of NS4A may be 
critical for cleavage at the 4B/5A site. As shown in Fig. 9, the 
HCV NS4A protein sequence is highly conserved among HCV 
strains and consists of a hydrophobic N-terminal portion, a 
central region implicated in /ra/w-cleavage at the 4B/5A site 
(highlighted in Fig. 9), and a highly charged acidic C-terminal 
segment. Although somewhat less conserved than other re- 
gions of NS4A (especially in comparison with HCV-J6 and 
HCV-J8), this central region contains two positively-charged 
residues, several hydrophobic amino acids, and an absolutely 
conserved Gly at position 27. The importance of these residues 
for NS4A fra/w-cleavage activity is currently being tested by 
site-directed mutagenesis. 
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The lack of trawj-cleavage at the 3/4A site in previous studies 
led to the suggestion that cleavage at this site occurred in cis (5, 
69). This cleavage has recently been shown to be insensitive to 
dilution, providing direct evidence for a cis mechanism (6), 
These observations are consistent with the results of pulse- 
chase analyses in which we (this report) and others (6) were 
unable to detect NS3-related precursors. Thus, in the current 
model, both the 2/3 and 3/4A cleavages are catalyzed by two 
distinct viral proteinases in cis. Of interest is the observation 
that substrates with an inactivated serine proteinase domain 
were resistant to fra/w-cleavage at the 3/4A site. Efficient 
fra/w-cleavage was observed, however, when the inactivated 
proteinase domain was deleted. Although other possibilities 
exist, these results, together with the observation that NS4A 
sequences are required for cleavage at the 3/4 A site (23), 
suggest that during translation of the polyprotein, the serine 
proteinase domain interacts with nascent NS4A to assume a 
conformation capable of cu-cleavage at the 3/4A site. In the 
case of the substrate with the inactivated proteinase, this 
intermediate still forms but is frozen because it is inactive for 
cis-cleavage. Thus, the 3/4A site of this substrate is not 
accessible to trans -acting proteinase, because it is probably 
bound in the substrate binding pocket of the inactive autopro- 
teinase. 

In contrast to the rapid cleavages observed at the 2/3 and 
3/4A sites, processing at the 4A/4B, 4B/5A, and 5A/5B sites 
was slower and appeared to involve multiple pathways (this 
study and reference 6). An obligate processing order was not 
observed, which is consistent with results from a study in which 
mutations blocking cleavage at each of these three sites had no 
significant effect on processing at other sites (40). Similar 
results have been obtained for flaviviruses (45, 46, 52, 56). It is 
important to emphasize that the processing pathways and 
kinetics observed in mammalian transient expression assays 
may not accurately reflect the situation in HCV-infected cells. 
In particular, fww-processing reactions, which are important 
for temporal regulation of RNA synthesis for other viruses (for 
example, see reference 44), would be expected to be sensitive 
to the concentration of transacting factors, which may be 
much lower in HCV-infected cells. Hence, these issues should 
be reexamined when systems become available for studying 
HCV replication in cell cultures. 

Using both the vaccinia virus-T7 and the Sindbis virus 
replicon expression systems, we found that the NS5B protein 
was unstable compared with the other polyprotein cleavage 
products (8) (Fig. 7 and 8 and data not shown). Turnover of 
NS5B was similar whether the protein was expressed as part of 
the full-length polyprotein or independently as a ubiquitin 
fusion protein. In contrast to the results in cell culture assays, 
NS5B was found to be relatively stable in reticulocyte rysates. 
Since NS5B is the putative HCV RNA-dependent RNA 
polymerase (51), down-regulation of this protein could play an 
important regulatory role in virus replication, as has been 
found for the RNA polymerase of alphaviruses (see reference 
66 for a review). For the pestivirus bovine viral diarrhea virus, 
the putative RNA-dependent RNA polymerase (p75) is unsta- 
ble, with a half-life of less than 60 min in bovine viral diarrhea 
virus-infected cells (21). However, the NS5 protein of flavivi- 
ruses, which is not cleaved into two proteins, is relatively stable 
(13). In contrast to our results, Bartenschlager et al. found 
NS5B (of a strain similar to HCV-J) to be quite stable when 
expressed in HeLa cells with a vaccinia virus recombinant (6). 
The reason for the discrepancy is unclear, but it could reflect a 
difference in the sequence of the expressed NS5B protein or in 
the cells used for the expression studies. As mentioned above, 
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these issues need to be reexamined in a system that supports 
HCV RNA replication. 

Finally, there is considerable interest in the HCV protein- 
ases as targets for development of new antivirus therapies. The 
general usefulness of such compounds will depend in part on 
their ability to inhibit the proteinases of diverse HCV types. 
Although it will be important to test more divergent protein - 
ase-substrate combinations, the ability of the HCV-H serine 
proteinase (subtype la) to rra/u-process an HCV-BK substrate 
(subtype lb), and vice versa, suggests that the essential ele- 
ments of recognition may be conserved. This is encouraging for 
the development of broadly effective serine proteinase inhibi- 
tors. 
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Bovine leukemia virus protease was purified to homogeneity and assayed by using murine leukemia virus 
Pr65*"*, a polyprotein precursor of the viral core structural proteins, as the substrate. A chemical analysis of 
the protease, including an amino acid composition and NH 2 - and COOH-terminal amino acid sequence 
analysis, revealed that it has an M T of 14,000 and is encoded by a segment of the viral RNA located between 
the gag gene and the putative reverse transcriptase gene. As expected from the nucleotide sequence data (Rice 
et al., Virology 142:357-377, 1985), the reading frame for the protease is different from both the gag and 
reverse transcriptase reading frames. The 5' end of the protease open reading frame extends 38 codons 
upstream from the codon for the NH 2 -terminal residue of the mature viral protease and overlaps the gag open 
reading frame by 7 codons. The 3' end of the protease open reading frame extends 26 codons beyond the codon 
for the COOH-terminal residue of the mature protease and overlaps 8 codons of the reverse transcriptase open 
reading frame. Several lines of evidence, such as protein mapping of the gag polyprotein precursor, the 
characteristic structure of the mRNA, and promotion of the synthesis of a gag polyprotein precursor by lysine 
tRNA in vitro, suggest that the protease could be translated by frameshift suppression of the gag termination 
codon. In vitro synthesized bovine leukemia virus £a£-related polyproteins were cleaved by the protease into 
fragments which were the same size as the known components of bovine leukemia virus, suggesting that the 
specificity of cleavage catalyzed in vitro by the purified protease is the same as the specificity of cleavage found 
in the virus. 



Bovine leukemia virus (BLV) is the etiologic agent of 
enzootic bovine leukemia (2). Structural analyses of the 
major viral core protein have allowed this virus, along with 
the recently discovered human T-cell leukemia virus (17, 
26), to be classified as members of a new family of retrovi- 
ruses (16). Chemical analyses of the viral structural proteins 
(4, 13, 14, 21) and nucleotide sequence analyses of the 
proviral DN As (1S-20) have shown that the gene order in the 
BLV genome is S'-gag-pol-env-x-y . The gag gene codes for 
core structural proteins pl5, p24, and pl2; the pol gene 
codes for reverse transcriptase; and the env gene codes for 
envelope proteins gp60 and p30. In addition to the structural 
proteins, several open reading frames have been observed, 
including two open reading frames in the pol gene and, as in 
human T-cell leukemia virus (23), two open reading frames 
downstream from the env gene (X region or long open 
reading frame region) (19, 20). Between the 3' end of the gag 
gene and the 5' end of the putative reverse transcriptase 
gene, the amino acid sequence deduced from the DNA 
sequence shows clear homology to the murine and avian 
viral proteases. Recently, we identified Moloney murine 
leukemia virus protease and reported that this protease is 
encoded by the gag-pol gene and is synthesized through 
suppression of the gag termination codon (U AG) (28). In this 
paper we describe the purification, chemical analysis, and 
biological specificity of the BLV protease. The possible 
mechanism of synthesis of the protease, which could be 
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translated by frameshift suppression of the gag termination 
codon, is discussed. 

MATERIALS AND METHODS 

Viruses. BLV was grown in fetal lamb kidney cells (24). 
Gazdar murine sarcoma virus (Gz-MS V) was grown in HTG-2 
cells (6). The viruses were purified by sucrose density 
gradient centrifugation by the Biological Products Labora- 
tory, Program Resources, Inc., National Cancer Institute 
Frederick Cancer Research Facility, Frederick, Md. 

Assay of protease. Gz-MSV, which itself has no protease 
activity and contains uncleaved Pr65*°* (30), a homolog of 
the polyprotein precursor of the gag (group-specific antigen) 
gene products in other murine viruses, was used as a 
substrate in assays for the protease activity. Protease activ- 
ity was assayed as previously described (29). In some cases, 
immune precipitates of BLV gag polyproteins synthesized in 
an in vitro translation system were incubated with the 
protease fractions under the same conditions. 

Purification of the protease. The method used to purify the 
protease was essentially the same method used previously 
for the mouse protease (28). To 60 mg of purified BLV 
suspended in 2 ml of 0.13 M NaCW).01 M Tris hydrochloride 
(pH 7.2)-0.001 M EDTA (STE buffer), 20 volumes of cold 
(-70°C) acetone was added. The precipitate was dried under 
reduced pressure. The resulting acetone powder was then 
extracted with 4 ml of 0.02 M Tris hydrochloride (pH 7.2)-5 
mM dithiothreitol (Sigma Chemical Co., St. Louis, Mo.) (TD 
buffer) containing 1 .0 M NaCl at 4°C for 30 min with constant 
stirring, and the preparation was centrifuged at 10,000 x g 
for 10 min at 4°C. The supernatant, which contained the 
protease activity, was then fractionated on a Sephacryl 
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FIG. 1. Purification of protease by RP-HPLC. The protease- 
active fraction from Sephacryl S-200 chromatography (see text) was 
further fractionated by RP-HPLC on a ^Bondapak C 18 column (0.39 
by 30 cm; Waters Associates). The gradient conditions used were as 
follows: 0 to 35% acetonitrile over a period of 2 min; isocratic at 35% 
acetonitrile for 10 min; and 35 to 60% acetonitrile over a period of 50 
min at a constant flow rate of 1.0 ml/min. One-tenth of each fraction 
was lyophilizeti and analyzed for protein composition. Staining was 
with Coomassie brilliant blue R-250. Lane V contained BLV (40 u,g). 

S-200 chromatography column (2.5 by 90 cm). After the 
protease activity of each fraction (3 ml) was determined, 
fractions were pooled and subjected to reverse-phase high- 
pressure liquid chromatography (RP-HPLC) by using a 
u.Bondapak Ci 8 column (Waters Associates, Inc., Milford, 
Mass.). Fractions (5 ml) were collected, and portions were 
removed for protein composition analysis, sodium dodecyl 
sulfate (SDS)-poIyacrylamide gel electrophoresis (PAGE), 
and protease activity measurements. 

SDS-PAGE. Various protein materials were analyzed by 
discontinuous SDStPAGE (11). Proteins were visualized by 
staining with either Coomassie brilliant blue R-250 (Bio-Rad 
Laboratories, Richmond, Calif.) or silver (25). Radio- 
fluorography was done by using a sodium salicylate treat- 
ment (3). 

Amino acid composition. Samples for amino acid analysis 
were hydrolyzed for 24 h in vacuo with 6 N HC1 containing 
0.1% liquid phenol and then dried by vacuum desiccation. 
Analyses were performed with a Durrum model D-500 amino 
acid analyzer by using ninhydrin to detect the eluted amino 
acids. 

NH 2 -termmal microsequence analysis. The purified prote- 
ase was subjected to automated Edman degradation in a gas 
phase sequenator (9) by using the program supplied by the 
manufacturer (Applied Biosystems, Inc.). The amino acid 
anilinothiazolinones were converted to phenylihiohydantoin 
derivatives by using 25% trifluoroacetic acid in water. Phen- 
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ylthiohydantoin amino acids were identified and quantitated 
by RP-HPLC (8). 

COOH-terminal sequence analysis. Protein samples were 
digested wtih carboxypeptidase Y for various times as 
previously described (14), and the amino acids released were 
quantitated by using the Durrum model D-500 analyzer. 

In vitro translation of BLV RNA. A 20- mg portion of 
purified virus in STE buffer was disrupted with 1% SDS and 
extracted twice with water-saturated phenol. The RNA was 
precipitated by adding two volumes of cold (— 20°C) ethanol 
and was collected by low-speed centrifugation. The RNA 
was then suspended in 0.5 ml of STE buffer containing 1% 
SDS and was fractionated by centrifugation on a 5 to 20% 
(wt/vol) sucrose density gradient (made up in STE buffer 
containing 0.1% SDS) at 30,000 rpm for 150 min in an 
SW40.1 rotor at 20°C. Fractions were collected, and the 
RNA was again precipitated by adding 2 volumes of cold 
(-20°C) ethanol. The fractions were kept overnight at 
-20°C, and precipitates were collected by centrifugation at 
4°C. The precipitates were solubilized in 100 p. I of water and 
heated at 90°C for 2 min. A 5-pJ portion of each fraction was 
analyzed to determine the size of the RNA species on a 1% 
agarose gel (12) by using bovine liver RNA as a marker. The 
in vitro translation experiments were performed in 25-u.l 
incubation mixtures at 37°C for 60 min by using an in vitro 
translation kit (New England Nuclear Corp., Boston, Mass.) 




FIG. 2. Cleavage of Gz-MSV Pr65*°* with purified BLV protease 
(fraction 11 in Fig. 1). Portions (2.5%) of fraction 11 from the 
experiment shown in Fig. 1 were lyophilized, solubilized in TD 
buffer (pH 7.0) containing 0.5% Nonidet P-40 (100 jil), and then 
incubated in the presence or absence Of Gz-MSV (1.5 jtg) as the 
substrate. After incubation at 20°C for 16 h, a 25-jxl portion of each 
sample was analyzed by SDS-PAGE. The gel was stained with silver 
nitrate. Lane 1, Moloney murine leukemia virus (7 jtg); lane 2 f 
Gz-MSV (1.5 fig) alone before incubation; lane 3, Gz-MSV (1.5 ng) 
alone after incubation; lane 4. fraction 11 alone (0.3 jig); lane 5, 
fraction 11 (0.3 jig) plus Gz-MSV (1.5 jig) after incubation; lane 6, 
BLV (7 jig). The gel was stained with silver nitrate (25). 



828 YOSHINAKA ET AL. 

with 25 u,Ci of i_-[ 35 S]methionine (New England Nuclear 
Corp,), a reduced concentration of potassium acetate (75 
mM), 1.0 mM magnesium acetate, and RNA which was ice 
quenched following heating at 90°C for 2 min. Samples (1 u,l) 
were removed to determine the radioactivity incorporated 
into the proteins, and the samples were analyzed by SDS- 
PAGE. 

Immunoprecipitation. lmmunoprecipitation was done as 
follows. Protein A-Sepharose (15 mg; Pharmacia Fine Chem- 
icals, Piscataway, N.J.) in 0.5 ml of Dulbecco phosphate- 
buffered saline and 10 pA of serum were incubated for 1 h at 
4°C. After washing with phosphate-buffered saline twice, the 
beads were incubated with a translation mixture, which was 
diluted 10-fold with immunoprecipitation buffer (0.02 M Tris 
hydrochloride pH 7.5, 0.1 M NaCI, 1% Nonidet P-40), for 16 
h at 4°C. Then the beads were washed three times with 
immunoprecipitation buffer and once with water and ana- 
lyzed by SDS^PAGE. 

RESULTS 

Detection of protease activity in BLV. To detect protease 
activity, the virus was incubated under the conditions used 
previously for the mouse protease assay (28). The optimum 
pH for protease activity was determined by incubating 30-^g 
portions of purified virus at various pHs in TD buffer 
containing 0.5% Nonidet P-40 for 16 h at 22°C in the 
presence or absence of the exogenous substrate Gz-MSV 
Pr65* fl *. It appeared reasonable to use Gz-MSV Pr65*°* as 
the BLV protease substrate because the COOH-terminal 
amino acid sequence of the BLV gag proteins (-Pro-Ala-Ile- 
Leu-COOH) has been shown to be similar to the sequences 
of the mouse gag proteins (13, 15). When purified BLV alone 
was incubated under these conditions, no precursor-product 
relationship like that seen in the mouse system was observed 
(29, 30). However, when Gz-MSV Pr65* fl * was added to 
BLV preparations in the assay described above, the Pr65*"* 
content decreased and there was a concomittant increase in 



TABLE 1. 


Amino acid composition of BLV protease 0 


Amino 
acid 


No. of 
residues 
per protein 
determined 


Predicted 

no. of 
residues* 


Asp 


10.32 (10) 


10 


Thr 


4.86 (5) 


5 


Scr 


7.75(8) 


9 


GEu 


13.29(13) 


11 


Pro 


12.07 (12) 


14 


Gly 


10.03 (10) 


9 


Ala 


9.56 (10) 


9 


Val 


10.02 (10) 


11 


Met 


0.66 (1) 


2 


lie 


6.78 (7) 


8 


Leu 


15.83 (16) 


17 


Tyr 


2.79 (3) 


3 


Phe 


2.08 (2) 


2 


His 


0.95 (1) 


0 


Lys 


4.21 (4) 


3 


Arg 


8.00 (8) 


9 


Cys 


ND r 


0 


Trp 


ND 


4 



" The determined molecular weight was 13.763, and the predicted molecular 
weight was 14,035. 

* Amino acid composition predicted from the DNA sequence (18). Based on 
the determined NH> and COOH termini, the sequence from nucleotide 1677 to 
nucleotide 2054 was used for this comparison. 

*' ND, Not determined. 
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TABLE 2. N-terminal amino acid analysis of BLV protease 



Cycle 


Amino acid 


Amt recovered 


1 


Leu 


210 


2 


Ser 


26 


3 


* * 

lie 


203 


4 


Pro 


i r r 

155 


5 


Leu 


167 


6 


Ala 


146 


7 


A. 

Arg 


65 


8 


X 




9 


Arg 


93 


10 


Pro 


100 


11 


Ser 


9 


12 


Val* 




13 


Ala 


90 


14 


Val 




15 


Tyr 


79 


16 


Leu 


94 


17 


Ser 


5 


18 


Gly 


37 


19 


Pro 


56 


20 


Trp 


11 


21 


Leu 


83 


22 


Gin 


26 


23 


Pro 


55 


24 


(Ser) 


2 


25 


Gin 


55 


26 r 






27 


Gin 


43 


28 


Ala 


21 


29 


Leu 


37 


30 


Met 


12 


31 


Leu 


46 



0 Based on the phenylthiodhydantoin amino acid derivatives. 

h Valine assignments were qualitive only due to the presence of an 
art i factual peak in the chromatographic analysis of the phenylthiodhydantoin 
amino acids. 

r Sample was lost during workup. 

the smaller proteins (data not shown). The activity was 
highest around pH 7.0 to 7.2. Thus, we decided to use 
Gz-MSV Pr65* fl * as a substrate in BLV protease assays to 
monitor activity in the procedures developed for the purifi- 
cation of the enzyme. 

Purification of the protease. The protease (molecular 
weight, 14,000 [14K]) eluted from the Sephacryl-200 column 
(see Materials and Methods) was further purified by RP T 
HPLC, using a shallow acetonitriie gradient on a u,Bondapak 
Cia column (Fig. 1). Fractions (5 ml) were collected and 
lyophilized for determinations of purity and activity. The 
peak activity was eluted in fraction 11 at an acetonitriie 
concentration of about 52 to 54%. This activity was clearly 
separated from proteins pl2, plO, and pl5. The purified 
protein produced a single band in SDS-PAGE gels (Fig. 1, 
fraction 11). The amount of total protein recovered in 
RP-HPLC fraction 11 was 35 jxg. The total recovery of 
enzyme activity was 40%, as determined from a 50% reduc- 
tion in the Pr65*°* band density after 16 h of incubation (28). 
Relatively large amounts of activity were lost during extrac- 
tion from the acetone powder of the virus. This was probably 
caused by the hydrophobic nature of the protein, as indi- 
cated by the position of its elution in the acetonitriie gradi- 
ent. However, this step was essential to remove the bulk of 
hydrophobic and low-molecular weight contaminants. To 
identify the protein responsible for the protease activity, the 
purified protease fraction from RP-HPLC was incubated 
with Gz-MSV Pr65*°*. As shown in Fig. 2, the activity was 
found in the 14K protein (fraction 11). In this fraction Pr65* fl * 
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TGGAAACGAGACTGTCCAACCCTCAAATCAAAAAACTAATAGAGGGGGGACTTAGCGCCC 1620 
(frl) (<^-pl2)-CysProTbrLeuLysSerLysA$nTrmTrm 

(fr3) GluThrArgLeuSerAsnProGlnlleLysLysLeuIleGluGlyGlyLeuSerAlaPro 

CCCAAACCGTAACCCCTAT AACAGATCCTCTTAGTGAGGCCGAATTAGAATGCCTACTTT 1 680 
GlnThrValThrProIleThrAspProLeuSerGluAlaGluLeuGluCysLeuLeuSer 

CTATTCCTCTGGCTCGCAGCCGTCCCTCCGTGGCTGTATACCTGTCTGGCCCCTGGCTGC 1740 
HeProleuAlaArg X ArqProSerValAI aValTyrLeuSerGlyProTrpLeuGln 



CCCCCATGGTAGGCGTCCTAGATGCCCCCCCGAGCCACATTGGATTAGAACATTTGCCCG 2100 
ProMet ValGTj^Va 1 LeuAspAl aProProSerHi s H eGlyLeuGl uHi sLeuProVa 1 

TCCCACCTGAGGTACCTCAATTCCCTTTAAACTAGAACGCCTCCAGGCCCTTCAAGACCT 2160 
(fr3) ProProGluValProGlnPheProLeuAsnTrm 

(fr2) ***GlyThrSerIleProPheLysLeuGluArgLeuGln (RT) 

FIG. 3. Alignment of the determined NH r and COOH-terminal amino acid sequences (underlined) of the protease with the amino acid 
sequence deduced from the DNA sequence (18). RT, Reverse transcriptase; frl, frame 1. 



was cleaved into 44K, 40K, and 20K major bands (all 
doublets), including minor proteins after incubation (Fig. 2, 
lane 5). The BLV protease purified in this way apparently 
did not process Gz-MSV Pr65 5 °* to the usual final products 
(pl5, p30, pl2, and plO). It remains to be determined which 
sites in Gz-MSV Pr65*°* are cleaved by the BLV protease. 
This could be accomplished by an NH 2 - and COOH-terminal 
analysis of each product. 

Chemical analysis of the protease. The amino acid compo- 
sition data for the protease are shown in Table 1. The total 
number of amino acids, not including cysteine and 
tryptophane, was 126, and the molecular weight calculated 
from these data was 13,763. The composition which we 
determined is in good agreement with the composition 
predicted from the nucleotide sequence (Table 1). To deter- 
mine the NH 2 -terminal amino acid sequence of the purified 
protease recovered from fraction 11, 0.25 nmol of protein 
was subjected to analysis on a gas phase sequenator. The 
amino acids identified at each cycle (of the first 31 cycles) are 
shown in Table 2, together with the quantitative yields. To 
determine whether the protease is virus encoded, we aligned 
the NH 2 -terminal amino acid sequence with the amino acid 
sequence deduced from the nucleotide sequence of the 
DNA. As shown in Fig. 3, the protease amino acid sequence 
starts with a leucine residue encoded by the triplet at 
nucleotides 1677 to 1679 in the pol gene (18), 70 nucleotides 
downstream from the end of gag gene, and located, as 
expected, in a reading frame different from that of gag (18). 
The following two lines of evidence suggested that the 
protease COOH terminus is at nucleotides 2052 to 2054 and 
not at nucleotides 2130 to 2132 of this open reading frame 
adjacent to the termination codon (UAG): (i) the molecular 
weight estimated from the SDS-PAGE profile and the amino 
acid composition data (Fig. 1 and Table 1) was about 14,000; 
and (ii) the data from a preliminary analysis showed that 



the COOH-terminal sequence of this protein was -Val-Gly- 
COOH.The protein and nucleotide sequence data together 
clearly showed that the protease-encoding region located in 
the genome from nucleotide 1677 to nucleotide 2056 is in a 
different frame than gag and the putative reverse transcrip- 
tase gene. 

In vitro processing of BLV gag precursors by the protease. 
To determine whether the BLV protease cleaves the BLV 
polyprotein precursor accurately, we used the polyprotein 
substrates synthesized in vitro. BLV RNA-directed cell-free 
protein synthesis yielded three major proteins, designated 
the 66K, 44K, and 14K proteins. The 66K and 44K proteins 
were related to the gag gene, as demonstrated by immuno- 
precipitation with anti-p24 antiserum (Fig. 4B, lane 5, and 
Fig. 4C, lane 7). The 14K protein is not considered here; it is 
described elsewhere (31). After the immune complexes 
(antigen-antibody-protein A-Sepharose complexes) of the 
66K and 44K proteins were incubated with the partially 
purified protease under the assay conditions described 
above, the cleavage products were analyzed by SDS-PAGE. 
As shown in Fig. 4, the same sizes of BLV gag proteins (p24, 
p!5, pi 2, and p!4 [protease]) were generated by the protease 
treatment (see below) (Fig. 5). Additional bands, labeled 
bands m and n, which were also present in the purified virus 
preparation, were detected as cleavage products (Fig. 4C, 
lane 9). We identified band m as plO, an NH2-terminal 
polypeptide of gag polyproleins. From these results, we 
concluded that the purified protease processes the BLV gag 
precursors in a specific manner. 

DISCUSSION 

BLV protease was purified, and its NH 2 - and COOH- 
terminal amino acid sequences were determined. Our results 
show that the gene for the BLV protease (Af r , 14,000) is 
located between the gag gene and the putative reverse 
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FIG. 4. In vitro cleavage of the BLV gag precursor proteins with the BLV protease. Whole incubation mixtures (100 p.1; labeled with 
L-[ 35 S]methionine or L-[ 35 S]cysteine) of the BLV RNA-directed in vitro translation system were immunoprecipitated with anti-p24 rabbit 
antiserum (see text). The immune complex was then incubated with the BLV protease fraction under the assay conditions described in the 
text. After incubation, proteins were analyzed by SDS-PAGE and stained with Coomassie brilliant blue R-250, and then autoradiography was 
performed by using sodium salicylate. (A) Stained gel of the L-[ 35 S]methionine labeling experiment. Lane 1, BLV whole virus (40 jig); lane 
2, partially purified protease (the RP-HPLC fraction used in this experiment contained some p24 in addition to protease pl4); lane 3, Gz-MSV 
(7.5 jig); lane 4, Gz-MSV plus partially purified protease; lane 5, the immune complex alone; lane 6, the immune complex plus partially 
purified protease. (B) Autoradiogram of the L-[ 35 S]methionine labeling experiment. The stained gel (panel A) was exposed to X-ray film. Lanes 
1' to 4', Tracings of lanes 1 to 4 in the stained gel (panel A); lane 5', the immune complex alone; lane 6', the immune complex plus partially 
purified protease. (C) Autoradiogram of the L-( 33 S]cysteine labeling experiment. Lane 7, The immune complex alone (no incubation); lane 8, 
the immune complex alone (incubated); lane 9, the immune complex plus partially purified protease; lane 10, partially purified protease; lane 
11, BLV whole virus (40 jig) (tracings of the stained gel). 



transcriptase gene and is coded in a reading frame that is 
different from the reading frames of the gag and reverse 
transcriptase genes (18). In both Moloney murine leukemia 
virus and feline leukemia virus (FeLV) the protease gene is 
located in the pol gene in the same reading frame as the gag 
gene (27, 28); and in the avian retroviral system, the protease 
gene is located in the 3' region of the gag gene (22). Thus, in 
BLV the organization of the protease gene in the viral 
genome is different from the organization in both murine 
leukemia virus and FeLV systems and avian systems. How- 
ever, within the protease-encoding region the translated 
BLV sequence shows homology with the avian and mam- 
malian type C virus sequences over the entire length of the 
region. The BLV protease is most closely related to mouse 
protease (18). 

Based on an NH 2 - and COOH-terminal analysis of the gag 
proteins, we suggested previously (13, 27) that the cleavage 
sites in human T-cell leukemia virus and FeLV gag precur- 
sors are very simitar to the cleavage sites in the mouse gag 
polyprotein. The cleavage sites in BLV are also similar in the 
COOH-terminal amino acid sequences of each gag protein to 
the cleavage sites in the above-mentioned retroviruses (4, 
13, 14, 27). Based on this similarity, we used Gz-MSV 
Pr65 gag as the substrate for the BLV protease during purifi- 
cation steps. Purified BLV protease was capable of cleaving 



the mouse gag precursor but apparently did not readily 
produce the correct final products (Fig. 2 and 4). However, 
we found that purified or partially purified BLV protease 
cleaved in vitro synthesized BLV polyproteins into the 
expected viral gag components without yielding other major 
products. Furthermore, when we used other proteins, such 
as bovine serum albumin or the mouse immunoglobulin 
heavy chain, as the protease substrate, no cleavage was 
observed. The exact specificity of the in vitro protease 
cleavage remains to be determined. 

Another interesting question is the mechanism of the 
synthesis of BLV protease. In murine leukemia virus and 
FeLV, the protease is synthesized by readthrough of an 
amber termination codon at the end of the gag gene to 
produce a gag-pol fusion polyprotein. After being syn- 
thesized under translational control in a quantity that is 
small relative to the amount of gag precursor (Pr65' oy ), 
Prl80*°*~ po ' is processed by proteolytic cleavage, and the 
protease is generated as Thr-Leu-Asp-Asp (from the gag 
gene)-Gln-(pc?/ gene product). An intriguing hypothesis has 
been proposed for BLV protease synthesis (18); according to 
this hypothesis a -1 frameshift might occur near the end of 
the gag gene. In that region, a guanine-cytosine-rich hairpin 
structure and six consecutive adenine residues are found; 
both characteristic sequences have been thought to be 
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FIG. 5. Model for the synthesis of gag and protease proteins. Based on the DNA sequence data and the results obtained from the cleavage 
patterns in the processing of the gag-related 66K and 44K polyprotein precursor by the protease, a model for the origin of the BLV protease 
was made. (A) Coding frame of the gag and protease in the BLV genome. Symbols: *r, termination codon; methionine residue found in 
the DNA sequence; A, cysteine residue found in the DNA sequence, fr., Frame. (B) Translated product from coding frames 1 and 3. The 
partitions determine the NH 2 and COOH termini (cleavage sites) of each protein. Symbols: *, point where a frame shift (from frame 1 to frame 
3) could occur; another possible cleavage by BLV protease. (C) Final cleavage products, detected by labeling with either L-I 33 SJmethionine 
or L-[ 33 S]cysteine. All of these products were apparently identical to the products found in the purified virus. Both p!2 (six cysteines) and band 
"n" (pl2 + x; seven cysteines) are seen in Fig. 4. Only • V* is indicated above. 



important for translational control of mRNA in procaryotes 
as well as eucaryotes (1, 5, 10). Not only the precise 
localization and reading frame assignment as described in 
this paper, but also the results of our biochemical studies 
with in vitro synthesized proteins are also compatible with 
this hypothesis. As reported previously by Ghysdale et ah 
(7) and as shown in Fig. 4, the 66K and 44K proteins were 
gag-related precursors. If frameshift suppression of the gag 
termination codon is the mechanism involved in the synthe- 
sis of the 66K polyprotein, we can expect the final cleavage 
products which we show in Fig. 5C. These products were 
obtained after incubation of the in vitro synthesized 66K and 
44K polyproteins labeled with either [ 35 SJmethionine or 
[ 35 S]cysteine. As shown in Fig. 4, we detected pl4 protease, 
p24, and pl2 labeled with [ 33 S]methionine (Fig. 4B, lane 6') 
and p24, pl5, band n, p 12, and band m labeled with 
[ 35 S]cysteine. p24, pl5 and pl2 are known structural proteins 
of BLV. Band n, which migrated slower than pl2 (Fig. 4), is 
what would be expected to be a product of [ 35 S]cysteine- 
labeled 66K protein after incubation with protease consisting 
of a major portion of pl2 plus downstream sequences in the 
protease reading frame. As for the fastest-migrating band 
(band m) detected in the [ 35 S]cysteine labeling experiment, 
we determined that this protein was plO, an NH 2 -terminaI 
piece of pl5. This protein was also found to be myristylated 
in the virus (unpublished data). Therefore, all of these 
products, including the band n and m proteins, which were 
obtained after cleavage of the 66K and 44K precursors with 
the protease (Fig. 4), match our model (Fig. 5) for the 
synthesis and cleavage of gag and the protease. From these 
results it seems likely that the protease is synthesized by 
frameshift suppression of the gag termination codons. 

The preliminary results of our in vitro translation experi- 
ments with lysine-specific tRNA (tRNA 5 Lys ; a gift from B. J. 
Ortwerth, University of Missouri, Columbia) may support 
this conclusion. In these experiments, the synthesis of the 



gag polyprotein precursor 66K was promoted by the addi- 
tion of lysine-specific tRNA to the BLV genomic RNA- 
directed cell-free translation system. The degree of promo- 
tion was proportional to the amount of the tRNA added, and 
about two- to threefold stimulation was obtained with 0.02 U 
of optical density at 260 nm. Synthesis of the 44K and 14K 
proteins was not promoted at this concentration of tRNA. 

A direct amino acid sequence analysis of ga^-protease 
fusion protein could elucidate further details of the transla- 
tional control and processing of BLV proteins. The other 
possibility, that a spliced mRNA is required for the synthesis 
of the #ag-protease polyprotein, cannot be ruled out. In any 
case, the diversity of translational control mechanisms at the 
gag-pol junction of retroviruses, such as avian virus, murine 
leukemia virus, FeLV, BLV, and human T-cell leukemia 
virus, provides useful models for examining features of gene 
expression in eucaryotic cells. It also remains to be deter- 
mined whether the protease itself cleaves all of the sites in 
£<j£-protease (66K) polyprotein or whether a cellular prote- 
ase triggers the initial cleavage event. As shown in Fig. 4, we 
incubated the 66K and 44K polyproteins after immunopre- 
cipitation with antibody without added protease and did not 
observe lower-molecular-weight products. Despite our in- 
ability to demonstrate self-cleavage in vitro of the 66K 
protein which contains the protease sequence, the possibility 
of autocatalytic cleavage under natural conditions cannot be 
ruled out. 
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ABSTRACT Retroviral capsid proteins and replication 
enzymes are synthesized as polyprotehis that are proteolytkally 
processed to the mature products by a virus-encoded protein- 
ase. We have purified the proteinase of human immunodefi- 
ciency virus (HIV), expressed in Escherichia coli, to «*90% 
purity. The purified enzyme at a concentration of »20 oM gave 
rapid, efficient, and specific cleavage of an in vitro synthesized 
gag precursor protein. Purified HIV proteinase also induced 
specific cleavage of five decapeptide substrates whose amino 
acid sequences corresponded to cleavage sites in the HTV 
polyprotein but not of a peptide corresponding to a cleavage site 
in another retrovirus. Competition experiments with different 
peptides allowed a ranking of cleavage sites. Inhibition studies 
indicated that the HTV proteinase was inhibited by peps ta tin A 
with an ICso of 0.7 pM. 



The capsid and nonstructural proteins of all retroviruses, 
including human immunodeficiency virus (HIV), are synthe- 
sized as polyprotein precursors that are proteolytically pro- 
cessed to the mature viral proteins by a virus-encoded, 
virus-associated proteinase (for a review see ref. 1). Viral 
proteinases (PR; for the new nomenclature for common 
retroviral proteins see ref. 2) have been purified from virions 
and biochemically characterized for a number of avian (3, 4) 
and mammalian (5-7) retroviruses. These enzymes share 
limited amino acid sequence homology with members of the 
aspartic proteinases (8, 9) and invariably contain the se- 
quence Asp-Thr(Ser)-Gly corresponding to the catalytic 
center of cellular aspartic proteinases. However, retroviral 
enzymes are much smaller than cellular aspartic proteinases 
and contain only a single homologous catalytic center. 

The proteolytic activity of HIV has been mapped to a 
11-kDa protein that is encoded immediately upstream of the 
viral reverse transcriptase (RT) and appears to be generated 
by autocatalytic release from a larger precursor (10-12). 
Replication of infectious HIV particles is entirely dependent 
on the generation of active PR, and a mutation of the putative 
active site Asp residue in the PR gene resulted in the 
production of "immature," noninfectious particles consist- 
ing of uncleaved precursor proteins (13). Although genetic 
and biochemical evidence has mapped the proteolytic activ- 
ity to this specific segment of the HIV genome, the mecha- 
nisms of PR formation and action are still unknown. Is 
dimerization of the enzyme a prerequisite for function (14) 
and, if so, how are the initial cleavages performed? Does the 
enzyme require activation or relief of inactivation or is 
processing confined to a specific (virus-induced?) compart- 
ment (for a review see ref. 1)? In addition to providing insight 
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into the biology of retrovirus replication, studies of the 
enzymatic functions of HIV have recently received much 
attention as these enzymes are potential targets for antiviral 
drugs. HIV PR, like the proteinases of other human patho- 
genic viruses, such as poliovirus, appears sufficiently distinct 
from cellular endopeptidases that inhibitors of the viral 
enzyme may not be harmful to the host. These studies require 
sufficient quantities of pure, active enzyme and functional 
assays in a pepti de-based system as well as on a natural 
substrate. 

We report here purification of enzymatically active HIV 
PR from an Escherichia coli expression system. The purified 
enzyme can cleave its natural substrate, obtained by trans- 
lation in vitro, as well as five different decapeptide s designed 
according to cleavage sites in the HIV polyprotein. Process- 
ing of these peptides occurred specifically at the site that is 
also cleaved in vivo, and a non-HIV-specific peptide was not 
cleaved by HIV PR. Competition experiments with different 
peptide substrates allowed ranking of the peptides according 
to the relative rates of cleavage. 

MATERIALS AND METHODS 

Purification of fflV PR. Construction of plasmid pHIV- 
proPII and expression in E. coli BL21(DE3) has been 
described (12). In the present study we used BL21(DE3) 
containing the T7 lysozyme gene on a chloramphenicol- 
resistance plasmid [BL21(DE3)plysS] as the expression 
strain. T7 lysozyme has been shown to inhibit 17 RNA 
polymerase (15), thus providing a more stringent control of 
base line expression. Bacteria transformed with pHIVproPII 
were grown in M9CA medium (16) with ampicillin and 
chloramphenicol in a 14-liter fermentor (Microferm MMF-14, 
New Brunswick Scientific, New Brunswick, NJ), essentially 
as described (17). Bacteria were collected by centrifugation 
and stored as a wet paste at -80°C. 

For each purification, bacterial paste was thawed in 50 mM 
2-(A^-morpholino)ethanesulfonic acid (Mes), pH 6.5/0.1 M 
NaCl/10 mM MgCl 2 /l mM EDTA and lysed in a French 
pressure cell at 75 MPa. The lysate was centrifuged for 15 min 
at 10,000 x g and 60 min at 200,000 x g, the supernatant was 
collected and made 10 mM in EDTA, and (NH 4 ) 2 S0 4 was 
added to 50% saturation. The precipitate was redissolved in 
50 mM Tris-HCI, pH 8.0/30 mM NaCI/1 mM EDTA and 
layered on a DEAE-cellulose column (Whatman DE52) 
equilibrated with the same buffer. The flow-through was 
adjusted to pH 6.5, made 1 M in (NH^SO*, and layered on 
a hexylagarose (Sigma) column equilibrated with 1 M 
(NH^SO* in 50 mM Mes, pH 6.5/1 mM EDTA (buffer A). 



Abbreviations: HIV, human immunodeficiency virus; TFA, trifluo- 
roacetic acid; PR, protease; RT, reverse transcriptase. 
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The column was washed with the same buffer and with 0.85 
M (NH 4 ) 2 S0 4 in buffer A and was eluted with 50 mM 
(NH 4 ) 2 S0 4 in buffer A. Protein was precipitated with (NrUh- 
S0 4 , redissolved in 50 mM Mes, pH 6.5/150 mM NaCl/1 mM 
EDTA (buffer B), and loaded on a 60 x 1.6 cm column of 
Sephadex G-50 fine (Pharmacia) in buffer B at 9 ml/hr. Peak 
fractions were precipitated with (NH 4 ) 2 S0 4 , redissolved in 
buffer B, and loaded on a Superose 12 HR10/30 column 
(Pharmacia) at 0.5 ml/min. 

In a control experiment BL21(DE3)plysS were trans- 
formed with plasmid pHIVproP (12). Purification was per- 
formed as described above except that the final Superose 12 
and the hexylagarose step were omitted. Instead, the DEAE 
flow-through was precipitated by addition of (NH 4 ) 2 S0 4 and 
the precipitate was redissolved in buffer B and loaded on the 
Sephadex G-50 column. 

In Vitro Transcription and Translation. Synthetic RNAs 
were transcribed with T7 RNA polymerase in vitro (18) from 
plasmids pHIVg/pIl and pHIV FSII (12). Both plasmids 
contain the HIV cDNA sequence from nucleotides 221 to 
2129 but pHIV FSII has a 4-base-pair insertion at nucleotide 
1640 of the HIV cDNA leading to a switch from the gag 
reading frame to the pol reading frame (12). Synthetic RNAs 
(final concentration, 50 ^g/ml) were translated in rabbit 
reticulocyte lysate (Promega Biotec) for 60 min at 30°C as 
described by the supplier. 

Synthesis and Purification of Peptides. Peptides were syn- 
thesized by the Merrifield method on an Advanced Chemtech 
automated synthesizer and were cleaved from the resin by 
liquid HF at 0°C in the presence of anisole and dimethyl 
sulfide as scavengers. Dried peptide resin mixtures were 
washed with diethylether and peptides were extracted with 
water and 80% acetonitrile in water. The extracts were 
lyophilized and the crude peptides were purified by reversed- 
phase HPLC using 0.1% aqueous trifluoroacetic acid (TFA)/ 
acetonitrile-based mobile phases. The lyophilized products 
were characterized by fast atom bombardment mass spec- 
trometry, HPLC, and amino acid analysis. 

Peptide Cleavage by HTV PR. Reactions were performed at 
30°C in 20 or 40 ^1 of 50 mM sodium phosphate, pH 6.0/25 
mM NaCl/5 mM EDTA/1 mM dithiothreitol with 0.44 mM 
decapeptide as substrate and 2 /il of partially purified PR (in 
purification buffer B). The reaction was quenched with a 
4-fold excess of 0.1% TFA and frozen. Reaction products 
were analyzed by re versed-phase HPLC using a Vydac C l8 
analytical column with a gradient from 95% A/5% B to 100% 
B (A is 0.1% TFA in water; B is 0.09% TFA in 60% 
acetonitrile/40% water) in 30 min at a flow rate of 1 ml/min. 
Absorbance was monitored at 215 nm. Cleavage products 
were identified by sequencing on an Applied Biosystems 
model 477 A sequencer equipped with a model 120A phenyl- 
thiohydantoin analyzer. Fast atom bombardment mass spec- 
trometry was carried out on a Kratos MS-80 RFAQ double- 
focusing mass spectrometer. 

Competition Experiments. Reactions were carried out at 
37°C in buffer consisting of 50 mM Mes, 25 mM NaCI, and 5% 
dimethyl sulfoxide at pH 6.0. When the substrate contained 
methionine or cysteine residues, 1 mM dithiothreitol was 
included. Substrate concentration was 250 jiM BI-P-136 and 
250 jxM another decapeptide. Reactions were started by 
adding 5-10 /il of enzyme in purification buffer B. For each 
time point a 20-/il sample was removed and added to 0.1 ml 
of 2% TFA in water. Peptides were separated on a Nucleosil 
C 18 column with a 26-min linear gradient at 1 ml/min from 9% 
to 56% acetonitrile in water with 0.05% TFA and elution was 
monitored at 210 nm. For each substrate and its products, the 
area under the combined peaks was independent of extent 
conversion of the substrate. Thus, substrate and products 
were detected and recovered with equal efficiencies and 
comparison of areas yielded extent conversion. 



When two substrates compete for the active site of an 
enzyme, kinetic analysis yields the ratio of Vm^/Km (19) 
rather than comparisons of either or K m individually. 
The rate constant ratios were determined from the extent of 
conversion of each substrate using the equation {V^/K^i/ 
(V^/ZC^ = log(l - FO/logd - FJ, where Fis the fraction 
of substrate that is converted to products (20). 

RESULTS 

Purification of Biosynthetic HTV PR. We (12) and others (10, 
11, 13, 21, 22) have recently reported expression of active 
HIV PR in E. coli. In our initial studies (12) we demonstrated 
that HIV PR was produced in E. coli BL21(DE3) transformed 
with the expression plasmid pHIVproPII, which contains the 
5' terminal part of the pol reading frame of HIV under the 
control of upstream elements of bacteriophage T7 gene 10. 
The mature enzyme (99 amino acids; refs. 10 and 22) was 
released by autocatalytic cleavage of the precursor (12). The 
expression level for HIV PR was low and the protein could 
not be identified on a Coomassie blue-stained gel of bacterial 
lysates (Fig. la, lane 1). Large quantities of bacterial cells 
were grown in a bench-top fermentor because of this low- 
expression level. 

We followed purification of the enzyme by immunoblot 
analysis using a polyclonal antiserum against HIV PR. This 
antiserum detected a single protein in lysates of induced 
bacteria (Fig. lb, lane 3) and did not react with lysates that 
did not carry the expression plasmid (Fig. lb, lane 1). The 
majority of HIV PR was soluble after high-speed centrifuga- 
tion (Fig. la, lanes 2 and 3). The enzyme was quantitatively 
precipitated by (NH^SC^ at 50% saturation (Fig. la, lane 4) 
and passed through a DEAE-cellulose column at low ionic 
strength. After this step, the enzyme could be visualized on 
a stained gel (Fig. la, lane 5) and was determined by laser 
densitometry to be ~1% of total protein. The material was 
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Fig. 1. Purification of HIV PR. (a) Aliquots taken at each step 
of the purification protocol were separated by NaDodSO 4 /10-20% 
polyacrylamide gel electrophoresis and proteins were stained with 
Coomassie blue. AA, amino acids. Lane M refers to a marker lane 
containing 1 jig each of seven molecular size standards (indicated in 
kDa). In lanes 1-6, 20 jig of protein (determined using the Bio-Rad 
assay) was loaded, whereas lanes 7-9 contained «*0.5 fig, «5 /ig, and 
«1.5 Mg. respectively. Lanes: l t lysate of induced bacteria carrying 
pHIVproPII; 2, S-10 supernatant; 3, S-200 supernatant; 4, 50% 
(NH 4 )2S04 precipitate; 5, DEAE flow-through; 6, hexylagarose 
eluate; 7, pooled peak fractions from G-50 column; 8, (NH^SC^ 
precipitate of G-50 pool; 9, pooled peak fractions from Superose 12 
column, {b) Immunoblot analysis of bacterial fractions. Twenty 
micrograms of lysate from bacteria carrying the vector plasmid (lane 
1), the deletion plasmid pHIVproP (lane 2), or the expression plasmid 
pHIVproPII (lane 3) and 1.5 ^8 of purified HIV PR (lane 4) were 
analyzed by gel electrophoresis and proteins were transferred to 
nitrocellulose for 2 hr at 0.2 A. The nitrocellulose paper was then 
probed with a polyclonal antiserum against HIV PR. Detection was 
with goat anti-lgG (rabbit), coupled to alkaline phosphatase (Tago), 
and with indolyl phosphate/nitro blue tetrazolium as the substrate/ 
indicator system, essentially as described (23). 
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then passed over a hexylagarose column (Fig. la, lane 6) 
followed by molecular size chromatography on a Sepharose 
G-50 column (Fig. la, lanes 7 and 8) and a Superose 12 
column. After this multistep purification, HIV PR was «90% 
pure as determined by laser densitometry of the gel shown in 
Fig. la, lane 9. Since the initial expression was low it is 
difficult to assess the level of overall recovery of protein but 
we could isolate «=25 Mg of HIV PR from 10 g of wet E. coli 
cell paste. Purified HIV PR was stable at -80°C for several 
weeks and did not appreciably lose activity on repeated 
freeze-thaw cycles. 

HIV PR obtained by this purification reacted strongly with 
the polyclonal antiserum in immunoblot analysis (Fig. \b> 
lane 4). The N-terminal sequence of the protein was deter- 
mined by automated Ed man degradation on an Applied 
Biosystems model 477A protein sequencer, using an o- 
phthaldialdehyde cycle in the first step to block N-terminal 
residues present other than proline as described by the 
supplier. Sequencing of the first five residues yielded Pro- 
Gln-Ile-Thr-Leu, the sequence found at the amino terminus 
of HIV PR (10, 22). 

Cleavage of HTV gag Precursor by Purified HIV PR. In our 
initial experiments we observed that crude extracts from 
bacteria carrying plasmid pHI VproPII induced cleavage of in 
vitro synthesized gag precursor pr53, whereas bacterial 
extracts carrying pHlVproP, which has a deletion of the 17 
C-terminal codons of HIV PR, did not catalyze this reaction 
(12). We now used this trans assay to monitor activity of 
purified HIV PR. 

Fig. 2, lane 4, shows the products of translation of a 
synthetic mRNA containing the coding region for HIV gag 
and, in a different reading frame, for PR. The primary 
translation product was the gag precursor pr53, which was 
completely stable to incubation in buffer alone (Fig. 2, lane 
5; see also ref. 12). Incubation of the gag precursor with 
purified HIV PR (Fig. 2, lanes 7-13) gave rapid and efficient 
processing to HIV capsid proteins. In lane 7, M).5 HIV 
PR was present in the incubation and in lane 8, «20 nM 
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Fig. 2. Proteolytic processing of in vitro synthesized HIV gag 
precursor proteins. Samples of translation mixtures programed with 
no mRNA (lane 1), FSU RNA (lane 3), or g/pll RNA flanes 4-13) 
either were mixed directly with sample buffer (lanes 1-3) or incu- 
bated with or without addition of bacterial fractions. Cleavage 
reactions were carried out in 20 /il final volume using 1.5 /d of 
translation mix as substrate and 1 ^1 of purified enzyme. Incubation 
was in 50 mM Mes, pH 6.0/20 mM NaCI/5 mM EDTA for 60 min at 
30°C unless otherwise stated. Reactions were analyzed on 12.5% . 
polyacrylamide/NaDodS0 4 gels. Kodak XAR-5 film was exposed to 
the dried gels for 16 nr. Lanes: 1, no mRNA; 2, mixture of 
u C-methylated proteins obtained from Amersham (sizes indicated in 
kDa); 3, FSII RNA; 4, g/pll RNA, no incubation; 5-13, g/pll RNA 
incubated with buffer alone for 1 hr (lane 5), partially purified extract 
from bacteria carrying plasmid pHlVproP for 1 hr (lane 6), «0.5 fiM 
HIV PR for 1 hr (lane 7), or ~20 nM HIV PR for 1 hr at 30X (lane 
8), 1 hr at 37°C (lane 9), 2 min at 30°C (lane 10), 5 min at 30°C (lane 
11), 15 min at 30°C (lane 12), and 30 min at 30°C (lane 13); 14, marker 
lane as in lane 2. Note that a protein of «42 kDa was seen even 
without incubation with PR (lanes 4-6). This protein is distinct from 
the cleavage intermediate and is derived from internal initiation at 
Met- 142 of the gag reading frame. 



purified HIV PR was used. The lower concentration of 
enzyme gave almost complete cleavage of pr53 to yield the 
major capsid protein CA/Ca + (CA + specifies the C- 
terminally extended capsid protein p25; ref. 24) and two 
intermediates of ~42 kDa and =18 kDa. Both intermediates 
were completely processed when 0.5 /tM HIV PR was used. 
The 18-kDa protein probably corresponds to the C-terminal 
intermediate of gag, which is further processed to the 
nucleocapsid protein NC and protein p9 (24). By using a 
different gel system, we could detect a product that migrates 
at =*8-10 kDa (data not shown). The 42-kDa protein probably 
corresponds to an intermediate containing the matrix protein 
MA and CA/CA + (12). The 42-kDa intermediate and CA 
comigrated with the corresponding products from an in vitro 
synthesized precursor containing equimolar amounts of gag 
and PR, which yielded efficient autocatalytic processing (Fig. 
2, lane 3). Incubation of pr53 with material purified from 
bacteria carrying the deletion plasmid pHlVproP (12) did not 
result in any cleavage (Fig. 2, lane 6), although 50-fold more 
protein was used than in the experiment shown in lane 7. 

We followed processing of pr53 with purified HIV PR in a 
time-course experiment (Fig. 2, lanes 10-13). Cleavage 
occurred very rapidly and cleavage products were already 
observed after a 2-min incubation (Fig. 2, lane 10). The 
cleavage assay was optimally performed at 30°C (Fig. 2, lanes 
8 and 9). The pH optimum for cleavage was between pH 5.5 
and 6.5, with pH >7 strongly inhibiting the activity, and the 
optimal salt concentration was between 20 and 100 mM NaCl 
(data not shown). 

Cleavage of Peptide Substrates by Purified HTV PR. While 
the purification protocol was being developed, we tested the 
activity of partially purified HIV PR on synthetic peptide 
substrates. These peptides were modeled according to cleav- 
age sites in the HIV gag-pol precursor. Fig. 3 shows that HIV 
PR induced cleavage of a decapeptide containing the se- 
quence at the PR/RT cleavage site in the HIV gag-pol 
precursor (peptide BI-P-127; see Table 1). Two products, 
distinct from the parent peptide, were resolved by reversed- 
phase HPLC (Fig. 3 b and c). Sequence analysis of the 
peptide products confirmed that they had the sequences 
C-T-L-N-F and P-I-S-P-I, respectively, and therefore re- 
sulted from cleavage at the Phe/Pro bond. No other peptide 
materia] could be found in any other fraction. Integration of 
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Fig. 3. Re versed-phase HPLC analysis of peptide BI-P-127 and 
HIV PR-generated cleavage products. The peptide was incubated for 
4 hr in buffer alone {a) or for 1 hr (b) or 4 hr (c) with purified HIV 
PR. Monitoring was performed at 215 nm. DTT and DTTox. 
correspond to dithiothreitol and oxidized dithiothreitoi, respectively. 
Cleavage products are identified by their amino acid sequence. 
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Table 1. Relative cleavage of HIV peptide substrates 
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IN, integrase protein. 

*Cleavage sites within the HIV gag-pol poly protein are designated according to the new nomenclature 
(2), except for the N-terminal product from the pol reading frame (p6*), for which there is no new 
name. CA + specifies the C-terminally extended capsid protein p25 (24). 

^Relative values of Vmax/K m were determined by using competition experiments. Each vaJue is an 
average of at least three determinations and is reproducible to ±20%. 



the absorbance peaks corresponding to the parent peptide 
and the two cleavage products showed complete conversion 
of parent to product and no losses were observed. To further 
support the specificity of this cleavage, peptide BI-P-127 was 
incubated with material purified from bacteria carrying the 
deletion plasmid (pHIVproP), which does not contain active 
HIV PR. This incubation did not result in any processing of 
the peptide, although 50-fold more protein was used than in 
the experiments with active HIV PR (data not shown). 

In addition to peptide BI-P-127, four other decapeptides, 
corresponding to cleavage sites within the HIV polyprotein, 
were synthesized. Peptide BI-P-136, which contains the 
N-termina! cleavage site of HIV PR (Table 1), was specifi- 
cally cleaved at the Phe/Pro bond to release fragments 
corresponding to the N terminus of the proteinase and the C 
terminus of the upstream pol sequences (p6*). Both cleavage 
products were identified by sequence analysis of materials in 
the respective peaks after HPLC analysis. Fast atom bom- 
bardment mass spectrometry showed total conversion of 
material in a peak of 1164 atomic mass units to material of 570 
atomic mass units (C-terminal fragment) and of 613 atomic 
mass units (N-terminal fragment) corresponding to the parent 
peptide and the two products after cleavage at the Phe/Pro 
bond, respectively. Three more decapeptides corresponding 
to the N- and C-terminal cleavage sites of the capsid protein 
CA/CA + (24) were synthesized. These peptides contained a 
Tyr/Pro (BI-P-138), Leu/ Ala (BI-P-144), or Met/Met (BI-P- 
140) dipeptide as scissile bond. Incubation of these peptides 
with HIV PR gave specific cleavage in all cases (Table 1), and 
all products were identified by sequence analysis. 

In addition to these HIV-specific peptides we studied the 
activity of HIV PR on a peptide that did not contain any HIV 
cleavage site. Peptide BI-P-102 corresponds to the cleavage 
site between the a-subunit of RT and IN of avian sarcoma- 
leukosis vims (Table 1; ref. 25) and contains a Tyr/Pro 
dipeptide as scissile bond but was completely stable to 
incubation with HIV PR under conditions where complete 
cleavage of the HIV-specific peptides occurred (Table 1). 

In a separate set of experiments, we determined the 
relative susceptibility of these peptides to cleavage by HIV 
PR. Relative values of V^/Km for all substrates (Table 1) 
were obtained in experiments where two peptides, BI-P-136 
and another, were incubated simultaneously with HIV PR 
and the substrates competed for the active site. This method 
requires fewer experiments than determining absolute values 
for ^max/Km for each substrate and is not affected by 
variations in enzyme activity in different experiments. Com- 
petition of peptide BI-P-136 with each of the other five 
peptides resulted in the relative values V^/Xm shown in 
Table 1. The best substrate, BI-P-136, was 5-fold more active 
than any other peptide tested, and peptide BI-P-140, con- 
taining a Met/Met dipeptide as the scissile bond, was cleaved 



at a significantly faster rate than the other three HIV-specific 
peptides. 

The approximate turnover number for HIV PR was esti- 
mated from the rate of substrate turnover and an estimate of 
the enzyme concentration. Cleavage of 250 BI-P-136 was 
complete within 20 min in the presence of —200 nM HIV PR. 
This corresponds to a turnover number of «1 s" 1 , if we 
assume that the enzyme was saturated under these conditions 
and the observed rate thus represents V,^, This value is a 
lower limit for the turnover number since the enzyme could 
not possibly be saturated for the entire course of cleavage of 
BI-P-136. 

Since HIV PR was shown to be inhibited by high concen- 
trations of pepstatin A (12, 21, 26), we determined the effect 
of pepstatin A on peptide cleavage (using peptide BI-P-136 as 
substrate) by purified HIV PR. These experiments gave an 
IC50 value for pepstatin A of 0.7 jiM. 

DISCUSSION 

In this communication we report purification of active HIV 
PR to 90% purity. The enzyme was expressed in bacteria to 
a relatively low level but it was largely soluble and could 
therefore be purified by making use of its basic net charge, 
hydrophobicity , and low molecular mass. These results are in 
contrast to a report showing partial purification of HIV PR 
from E. coli (21), in which the enzyme was highly insoluble. 
Moreover, these authors followed purification of the enzyme 
only by immunoblot analysis and the degree of purity of their 
partially purified proteinase was not stated. 

The purified enzyme was shown to be active on an in vitro 
synthesized gag precursor and on decapeptide substrates. 
Processing of the structural precursor was very efficient, 
giving almost complete cleavage with 1=3 20 nM enzyme. Thus, 
HIV PR was significantly more active on its natural substrate 
than polio virus 3C, for which a concentration of 25 fiM was 
required to achieve equally good cleavage of its natural 
substrate (17). An efficient proteolytic enzyme is desirable 
for HIV since gag precursor and PR are produced in a ratio 
of about 10:1 (27) and every enzyme molecule must cleave at 
least 50 peptide bonds to achieve complete processing of the 
gag and gag-pol polyproteins. 

Purified HIV PR induced cleavage of five decapeptides that 
contained sequences corresponding to cleavage sites within 
the HIV polyprotein. Specific cleavage of peptides was also 
observed in a very recent study using chemically synthesized 
HIV PR (26). However, no kinetic analysis was performed in 
this study and the experiments were carried out on peptides 
of variable length and at a different pH. These differences 
may account for the different "ranking" of peptides these 
authors observed. Moreover, the chemically synthesized PR 
seems to be considerably less active than PR purified from £. 
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coli. This may be due to the presence of a high percentage of 
inactive molecules that accumulated during chemical synthe- 
sis. 

On the basis of amino acid comparisons between different 
viruses, a consensus sequence for retroviral cleavage sites 
has been proposed (28), suggesting a generally hydrophobic 
pattern of amino acids surrounding the cleavage site with Tyr 
(Phe)/Pro being the most frequently occurring PI and PI' 
residues, respectively. Three of our five decapeptide sub- 
strates match this pattern of amino acids very closely, 
whereas the other two are dissimilar but still contain primar- 
ily hydrophobic residues surrounding the scissile bond. 
Additional substrate determinants are likely to reside in 
amino acids on both sides of the scissile bond but larger data 
sets are required to define these determinants. HIV PR did 
not cleave an oligopeptide containing a Tyr/Pro cleavage site 
that served as a substrate for PR of avian sarcoma-Ieukosis 
virus (25). On the other hand, avian PR could process several 
oligopeptides corresponding to HIV cleavage sites (25). It 
remains to be seen whether the substrate requirements for 
small peptides generally match those for cleavage of viral 
polyproteins. Accessibility of potential cleavage sites and 
structurally flexible contexts are obvious additional determi- 
nants. 

Cleavage of different di peptide bonds by one enzyme raises 
the possibility of differences in the relative susceptibility of 
these sites. This could be a regulatory feature in that 
processing may be required for activation or inactivation of 
functional domains within the polyprotein during and after 
assembly of the virion. In competition experiments we 
demonstrated that a peptide corresponding to the N-terminal 
cleavage site of PR was processed significantly faster than all 
other peptide substrates. These results suggest that this 
cleavage, which leads to separation of the structural proteins 
of the nucleocapsid from the replication enzymes, may be a 
very efficient first step in the processing pathway. A similar 
relative order of processing steps is observed in picornavirus 
replication where cleavage between the structural and non- 
structural domains of the viral polyprotein is a very efficient 
first step in the processing cascade (1). Recently, it was 
shown that the CA/NC cleavage can occur at three different 
sites within a 16-amino acid sequence (24). Our results 
indicate that purified HIV PR can cleave at least two of these 
sites and that the Met/Met site may be the preferred cleavage 
site since processing of a peptide containing the Leu/ Ala site 
was significantly slower. 

Retroviral proteinases are believed to be aspartic protein- 
ases and it is of interest in this regard that pepstatin A, which 
had been shown to inhibit polyprotein cleavage by retroviral 
enzymes at a very high concentration (>0.1 mM; refs. 9, 12, 
21, and 26), inhibited peptide cleavage by purified HIV PR 
with an IC50 of 0.7 /*M. This inhibitory effect is still very weak 
compared to a K\ for pepsin of 4.5 x 10" u M (29) but is 
comparable to the effect of pepstatin A on renin (K % = 0.1- 
1 /iM; ref. 29), consistent with the classification of the 
retroviral enzymes as aspartic proteinases. 
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ABSTRACT The protease of the human 
immunodeficiency virus type I (HIV1) was ex- 
pressed both intracellular^ and extracellularly 
in Saccharomyces cerevisiae. Intracellular ex- 
pression of the protease was achieved by fusing 
a 179 amino acid precursor form of the protease 
to human superoxide dismutase (hSOD). Self- 
processing of the viral enzyme from the hybrid 
precursor was demonstrated to occur within 
the yeast host. Secretion of the protease was 
achieved by fusing the leader sequence of yeast 
a-factor to the precursor form of the protease 
or to the 99 amino acid mature form of the pro- 
tease. Authentic and active forms of the retro- 
viral enzyme were detected in yeast superna- 
tants of cells expressing the precursor or the 
mature form of the protease, A D25E active site 
variant of the retroviral enzyme exhibited di- 
minished autocatalytic activity when expressed 
intracellularly or secreted from yeast. The wild- 
type protease was active in an in vitro assay on 
the natural substrate, myristylated gag precur- 
sor, Pr53 w . Correct processing of Pr53 w at 
the Tyr 13&-Pro 139 junction was confirmed by 
amino terminal sequence analysis of the result- 
ing capsid protein {CA t p24). The secreted pro- 
tease was purified to homogeneity from yeast 
media using preparative isoelectric focusing 
and reverse-phase HPLC. Amino terminal se- 
quence analysis showed a sequence beginning 
at amino acid 1 of the mature enzyme (Pro) and 
another sequence beginning at amino acid 6 
(Trp). This shorter sequence may represent a 
natural autolytic product of the protease. 

Key words: aspartyl protease, HTV7AIDS, yeast 
expression, polyprotein process- 
ing, a -factor, secretion 

INTRODUCTION 

Proteases are essential to the life cycle of all 
known retroviruses. The polyproteins produced by 
the compact viral genome must undergo limited pro- 
teolysis by processing enzymes, which release the 
various proteins required for the replication and as- 
sembly of the virus. 1 In order to ensure the correct 



processing of viral polyproteins, retroviruses fre- 
quently encode highly specific proteases 2 that are 
capable of recognizing unique amino acid sequences 
and hydrolyzing specific peptide bonds at the junc- 
tions between the viral proteins. 3 A potential means 
of intervention of the viral life cycle would involve 
inhibition of these hydrolytic events. 

The etiological agent of AIDS is a retrovirus re- 
ferred to as human immunodeficiency virus (HIV). 4 
Genetic analysis of the molecular organization of 
the HIV genome has revealed that, as in other ret- 
roviruses, it consists of the gag, pol, and enu major 
genes encoding several structural proteins and en- 
zymatic activities. 5 " 7 These genes are expressed as 
polyproteins that require posttranslational proteol- 
ysis to yield the mature viral proteins. Processing of 
HIV type 1 (HI VI) gag and gaglpol precursors in- 
volves several proteolytic steps which are at.least in 
part mediated by a viral protease encoded at the 5' 
end of the pol gene 8 and expressed between gag and 
pol in the gaglpol precursor. Evidence that the- viral 
proteolytic activity is crucial in viral replication is 
the loss of infecti vity detected in virions encoding a 
mutant protease. 9 In order to gain a better under- 
standing of the functional and structural aspects of 
this viral enzyme, reagent quantity levels of the au- 
thentic enzyme must be available for appropriate 
biophysical analysis. By using a heterologous ex- 
pression system to overproduce the cloned gene 
product, the dependence upon natural sources of in- 
fected cells can be avoided. Furthermore, advantage 
can be taken of the expression system to produce 
genetically altered gene products that address spe- 
cific structure-activity questions relating to the 
mode of action and substrate specificity of the en- 
zyme. 

Bacterial expression of various DNA fragments 
encoding different HI VI .protease precursors re- 
vealed that the 10 kDa retroviral enzyme is auto- 
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catalytically released from the higher molecular 
weight precursors 10 ->2m,i„ ir «««ecuiar 
is abol.^lT^L ^ 1,as 8elf -P ro «ssing activity 

? w5 hB aSpartic acid Jocated Posi- 

a ™ ember of *e aspartyl protease family The 

the S, reP °f ed ^^nal structures of 
the HIV1 protease" and the related Rous sarcoma 
virus protease" confirms this classification cX 

nomi , v » ™ tat,onal analysis of the viral ge- 
nome lmt , a „ y identified the 

ST StS? P ^ eaSe K 9 DeSpite thls WitaS ^ 
cess, virtually all subsequent studies on the 

expression and mutagenesis of the protease have 

HlV7Sr d " baCterfa - YeaSt W« c? h 
S ove™^ • r*" 6 an altern ««ve method 

variant * *"* PUrifying thiB protein an d ^ 

host for the expression of recombinant proteins So- 
ph-stacated plasmids have been constructed that t 
2SLS» £* lat ? d K .^--P«onal control 
hoste d »ir ^t^^^^'^^P^teaBe-deficient 
hosts and efficient transformation techniques 22 " 24 
can be used to further increase the levels o'f produc- 

stretfoSof'^ Pr ° te : n l SCqUenCe8 ™1«-ed for the 
E ied a£f yeaSt P heromone -^tor have been 
identified and can be used for the secretion of 
various cloned gene products. 2 *-" HeterokgouTex- 
pression of a number of human paCfn^o 

ti on 33.34 f . . , "yriwyiauon and acetyla- 

shTwn tn i retr ° VIral and huma n Proteins have been 

Pate ^at thir " M ^ ° hs ™«™ antici- 
pate that this microorganism will continue to be a 
powerful tool for the biosynthesis and analysis of 

end w nS H hat amKM diseaSo th s 
traceZlS? TT* n ' ^ pr ° teaSe both ^ ! 
have established m v,vo and in vitro assays usimr ite 
natura,l y occurring snbstrate , ^ wffl 5 STff 
ther characterization of the enzyme and permit tiL 
eventual design of inhibitors ttat may a3 S 
symptoms of AIDS. 36,36 e 

MATERIALS AND METHODS 
Plasmid Constructions and 
Genetic Manipulations 

acSs^es^ 9 en r d l S 3 of 99 amino 

acids correspond.ng to the putative protease." The 

2 ™r mg P^^ide was constructed 
from 13 oligonucleotides with 22 bp overlanT Frl 

,! 0d0nS °J WeWy «™« * S ProteS 

^roughout the coding sequence to facilitate future 
manipulates of the gene. Sequences were included 
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at the 5' end of the synthetic gene to regenerate an 
Xbal site and to encode the ■ . 

OeuAspLysArg) of 

duded at the 3' end to encode !\^Z^£i 
and to regenerate a Soil cloning site (F^IA) 
Hasmid P PR1 79 encodes the £ ^'J^. 

HIV1 protease together with 55 . • 

acid residues at tL n , ^ additional amino 

amino ad^at th e 25 additional 

this 179^ „ th .e C- terminus. The DNA encoding 

adapter was added to the 5" end of the^DNA^r 
ment to regenerate an y^i L?\ : UNA fra e- 
code the KEX?„ restriction site, to en- 

resfdues S Site and three addi «°nal 

generate a £m° (VheAr ^ ™* to re- 

generate a BgUl restriction site A svntKp*^ »j„ * 
was added to the 3' end of tL DNA r adapter 
regenerate a Bofl restriction *L 2 ^ * 

ditional ProresiduerfTh?" / ' ! nC ° de m ad ' 
fTAr^ „ a 1 siaue 01 we Po' gene and a stop codon 

Si ' kl~ r ^ enerate ° Sal1 restric «e« Bite 

P ese reengineered Xbal/Sall DNA franlf f ' 
encoding the protease were used for 

22 5S2r involving 8ecretit>n of 

For external localization of the protean tK 0 * 

-SSSSr T fra ^-ts7eS; tl" 

were J gated to the a -factor leader sequences and to 

GAPDH « Se Th regUla n d ^ 

ZT*- t 1L r «sultmg DNA fragments were li 

ura3-52 t pep4-3 his4^RO X^JxTY f 12t 
media essentially as described 33 aen «ent 

mif P SO t SV 7 T ,iZati ° n f the protease - »» Plas- 
i»ju pouu/^K179 was made m which tiio w . 

terminus of human superoxide dismutasr^OD) A 

SsitMt f STMetX era d t rH an ^ reSWc - 

n r*u 7 Loae 3 meX > AJa, and three amino aciHq 

lltXm r : t Pr ° dUCt (Ph ^ G1 ^ — to regenf 
erate a Bj/II restncbon site. The Bafl/So/I synthetic 
adapter described above was added to the Fend of 
2£Sr C enCO r ing ^ e ^^»ences. This DNA 

PR179 Thi! J' 500 /? 1 ^ t0 generate pS0DCF2/ 
i-11179. This recombinant plaSmid creates an in- 
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A. P PR99 



xt+i 



f.b'Ml llJIT f tile 



LcuAspLy sArg 
CTAG ATA AAA G A 
TATTTTCT 



t 



STOP 
TAG 

ATCAGCT 











LDT 


Ptuuilmo 


kacfcy 


WVI PtoMetae Sy*fctfc Gew 


IM6bp 


230 bp 


99 u 




B, P PR179 



KEX3 



LcuAspLysArgVhc ArgGlu 
CTAGATAAAAGATTTAGAGAA 

TATTTTCTAAATCTCTTCTAG 



B|)0 



Bill 



ProSlop 
CCATAG 
GGTATCAGCT 





Synthetic 

Adaptor > 






B*mHJ 






LDT 




ADHCLGAPDH 
10*6 bp 


a-facicr 
h*da 
220 bp 


HJV) pot 
52 u 


H1VJ Pkxok 
99 u 


HJVlpo* 
Sequence 
24u 




C. PSOD/PR179 



Hcol 



Bill 



MctAUPhcArgGIu 
CATCGCTTTCAGAGAA 

CGAAAGTCTCTTCTAG 



s*n 




ProStop 
CCATAG 
GGTATCAGCT 









founder 


SOD 




1366 bp 


4« bp 





HJV I pd 
Sequences 

52 *» 



T-Wl 



FnteuT 
Sequoiccj 
99 u 



HIVlpoT 



24 u 




of yeast recombinant plasmids for the 
expression of HIVt protease in S. cerevisiae. (A) Expression cas- 
sette for plasmid pPH99. The cassette encodes a 99 amino acid 
Hrvi protease. (B) Expression cassette for plasmid pPR179 The 
cassette encodes a viral protease containing 55 additional amino 
aad residues at its N-terminus and 25 additional amino acids at its 
C-lerminus. Bolh plasmids include the yeast pheromone a-factor 
sjojiaWeader sequence (or secretion of the protease and the 
KtX2 processing site which is denoted with an arrow (C) Expres- 
sion cassette tor plasmid P S0D/PH179. The cassette encodes a 
lusion protein for internal expression in yeast. The hybrid protein 



consists of 333 amino acids with the C-terminus of hSOD fused to 
.the N-termtnus of the HIVi viral protease precursor. These cas- 
settes were Itgnted into the yeast expression vector pBS24 1 
which included p-lactamase, Ieu2-d and ura3 markers, 2u. se- 
quences, and the a-factor transcription terminator. LDT repre- 
sents the amino acids Leu. Asp, and Thr located at the putative 
active site- of Ihe HIVI protease. Arterisks indicate sites where 
aspartjc acid 25 was replaced with a glutamic acid. AU recombi- 
nant ptasmids,inctucJe the ADH2K3APDH promoter to regulate the 
expression of the viral protease. : ' ' 



frame fusion of hSOD to the pol gene products with 
the carboxy] terminal Asn of hSOD being replaced 

m «? 22 79 W3S ,80lated and into the 

plasmid pSIl 22 to provide the ADH2/GAPDH pro- 
mo er. FinaDy, the BamYUlSall fragment of the re- 
sulting p ] asmid pSIl/PRi79 was introduced into the 
yeast vector pBS24.1 described above. The resulting 
recombinant plasmid, pSOD/PRi79 (Fig 1C) was 
prepared and used to transform spheroplasts of the 
teaS ^ f e o r ' cient str ^n S. cerevisiae AB116 (prcl- 
407, prbU22 pep4-3, l eu2 , trpl, uro3-52. [c£)) « 
Yeast transformation, selection, and growth of re- 
combinants were achieved as described elsewhere 20 
except that YEP media" supplemented with 2% 

trZ? T ed inSt * ad 0f uraci J-deficient media 
ior intracellular expression from the ADH2/GAPDH 
promoter. 

Ob'gonucleotide Synthesis and 
Site-Directed Mutagenesis 

Oligonucleotides were synthesized using solid- 
phase phosphoramidite chemistry on an Applied 
B.osystems 380A DNA synthesizer. Theideprotected 
oligonucleotides were purified by electrophoresis on 
15% polyacrylamide gel containing 8 M urea and 

graphed through a Sep-Pak C 18 cartridge, lyo- 
phihzed and suspended in water. The mutagenic 

o? tidC 5 ' AGCTCT ATTAGAAACAGGAG- 
^aua r 3 was used to prime the synthesis of a DNA 
strand encoding an HIVl protease template with a 
glutamic acid codon <GAA) in place of the aspartic 
acid codon (GAT) at position 25 in the viral protease 
Mutagenesis, transformation of E. coli CJ23e' 
screening, and template sequencing were done as 
prevzously reported." Yeast plasmids encoding mu . 

S r^r 1 " 6 ° btainCd 3fter re P ,acin & 522 
Dp OglWSaa fragment of pPR179 and pSOD/PR179 

with .dentical DNA fragments except for the glu- 
tamic acid codon replacement. The resulting engi- 

^nnlo?^ 5 Were denoted P PR "9 D25E and 
PS0D/PR 179 D25E> respective , Vi accor(Ji to stan 

aard nomenclature for mutant proteins.* 2 

Detection of III VI Protease Expressed 
m Yeast 

Cultures of S cerevesiae ABllO grown in uracil 

?oSol e f m ^ ,a f ° r 24-48 h ° UrS Were ^ntrifuged at 
'000g> for 15 minutes and proteins in 10 ml of su- 
pernatant were precipitated with 5 ml of 50% tri- ' 

rSr^ Which contain ed 2 mg/ml deoxy- 
cho ate. After incubation at 4"C for 30 minutes 

12 OOOg for 30 minutes, washed with acetone, dried 
and suspended in 40 uJ of sample buffer." Yeas^ 

fnrl ! on w f e , harvested ™ described for ABllO 
and l OD 6S0 of cells were lysed by repeated boiling 
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noT bhTpr^ 6 ,yCer0l V 3% SDS ' 0 02% brombphe- 
noi blue. Proteins were fractionated by SDS-PAGF 

. m 15% polyacrylamide gels and the protei „ w 

ThelflV ltl m 3 B, ^ Rad ^"s-BIrt apparatus. 

m7r2b/t n . T de ! CCted ° n in >~blots us- 

cen "aS v-st 31 faiSed a * ainst con- 

centrated yeast supernatant from cells expressing 

an hjvi polypeptide encompassing the 78 C- 

termma ammo acids of the protease !nd the 37 £ 

£T r r T dUeS ° f the reveree transcriptase 20 A 

peroxidase (HRP, Boennger/M. !n nheim) was used 
for immunodetection. The rabbi. v^olycWl antibod 
. s recogruze secreted yeast proteins other Oian the 
SEr'SSS ilDmuQ ^n was only P i! 

SO™>R f Pn0r , l ° The human 

SOD/PR fusl p 0)vpeptide was 

mouse monoclonal antibodies raised against hSOD 
as primary antibodies and goat anti-mouse Lubod 
JJ conjugated to HRP (Tago) as secondary a^Lt 

In Vitro Assay for HIVl Protease 

HTV1 protease activity was assayed by incubating 
yeast recombinant myristylated Pr53-- with Z 
protease and monitoring the formation of matrix 

using SDS-PAGE and immunoblotting. The specific 
viral proteins were visualized on immuUloWW 

SSU 0 " f° s infected patiente 35 

gated to Hp (Tago) as the second antibody Total 
Iysates of HIVl-infected cells'* were included a 
standards to visualize HIVl viral proteins Re ac 
tions were carried out in 10 mM Tris-HCl n H i n 
130 mM NaCl. 1 mM EDTA^nd 1 S S^' 

dm from transformed yeast AB110 or in partia]lv 

,y f a , te8 , fr ° m ^nned yeast ABHs! 

tratS, »r ,y ^ r6SSed Protease was -ncen- 
1 056 med,a of yeast ABllO expressing 

SSSfar^ ! ^ rem ° Ved froR1 the med « at 
StS at T" 8 ' ^ su P er »^nt then con- 
centrated at 4 C approximately 20-fold using either 
a Cenfa-icon 10 or a Centriprep 10 (Amicon) w ft I 
molecular weight cuteff of 10 000 
Partial purification of the internally expressed 

kslp tp"* from . 

W ^trif r °T Ce " BUitUres were P e » e ted 
by centrifugation and. frozen at -20°C The cells • 

were resuspended on ice in 50 mM Trls-HCl „H « n 
KC1. and 0.5% Triton X-100 and sonicated usinglo. 
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second bursts until the viscosity of the solution that 
resulted from liberation of chromosomal DNA was 
significantly reduced. The lysed cells were then clar- 
ified by centrifugation at 5000g for 20 minutes. To 
remove DNA from the sample, neutralized prot- 
amine sulfate was. added to 0.5% (w/v), the solution 
incubated at room temperature for 15 minutes, and 
centrifuged at 7000g for 15 minutes. The superna- 
tant was then adjusted to 15% ammonium sulfate 
using a standard solution. After 30 minutes at room 
temperature, the precipitate was removed by cen- 
trifugation and the supernatant was brought to 40% 
ammonium sulfate to precipitate the protease. Fol- 
lowing a 1 hour incubation at room temperature, 
and centrifugation at 7000g for 15 minutes, the pel- 
let was collected, resuspended in 10 mM Tris-HCl r 
pH 7.5, and dialyzed exhaustively against 10 mM 
Tris-HCl, pH 7.5 buffer. This resulted in a 20-fold 
purification of the protease as monitored by OD 280 . 

Am i n o-Terrn in al Amino Acid Determinations 

Sequential Edman degradation of the viral prod- 
ucts resulting from specific hydrolysis by the viral 
protease was performed using an Applied Biosys- 
tems Model 470A gas-phase protein sequenator. The 
sequenator was equipped with an on-line phenyl- 
thiohydantoin (PTH) amino acid analyzer. The 
PTH*amino acids were sepaiated by reverse-phase 
chromatography on a Brownlee C 18 column. 46 To de- 
termine the amino-terminal sequence of the pro- 
cessed capsid protein p24, 15 fig of myristylated 
Pr53^°* was digested with concentrated media from 
yeast transformed with either pPR99 or pPRl79 us- . 
ing the conditions described for the in vitro assay. 
The length of incubation ranged from 10 to 48 hours. 
The products of the digestion were then separated by 
SDS-PAGE. After electrophoresis, proteins were 
electroblotted onto PVDF membranes (Imraobilon-P 
Transfer Millipore) in 10 mM 3-[cyclohexylamino]- 
1-propanesulfonic acid (CAPS), pH 11.0, 10% meth- 
anol. Membranes were washed in de ionized water, 
stained with 0.1% Coomassie Brilliant Blue R-250 
in 50% methanol, destained, dried, and stored at 
— 20°C. Strips of PVDF membrane containing the 
capsid protein were cut out and the protein was se- 
quenced following standard procedures. 47 Amino- 
terminal sequence analysis of the H1V1 protease 
was carried out on purified PR99. 

Purification of HTV1 Protease Secreted Into 
Yeast Media 

Twelve liters of yeast media from cells trans- 
formed with pPR99 that had been grown for 48 
hours was concentrated to 2 liters using a Pellicon 
tangential flow concentrator (Amicon) equipped 
with a PTGC cassette with a 10,000 molecular 
weight cutofF. The sample was concentrated further 
to 200 ml on an Amicon ultrafiltration cell using a 
YM5 membrane with a 5000 molecular weight cut- 



off. Following exhaustive dialysis against water, the 
solution was adjusted to 0.1% Triton X-100 and 1 
mM PMSF. Protein separation by preparative iso- 
electric focusing was accomplished on a Rotofor unit 
(Bio-Rad). A 55 ml sample of concentrated and dia- 
lyzed media was mixed with 1.5 ml of pH 3-10 am- 
pholytes (40% solution, Bio-Rad), and separated on 
the Rotofor according to the manufacturer's indica- 
tions. Twenty fractions of approximately 2 ml each 
were collected and their pH and OD^q were mea- 
sured. The presence of the viral protease in these 
fractions was determined on immunoblots and found 
to be localized in fractions which range from pH 9 to 
12. These fractions were pooled, placed in dialysis 
bags, and concentrated against polyethylene glycol 
(PEG). The PEG was then removed by dialysis 
against water. The concentrated sample was then 
subjected to reverse-phase HPLC using a C 3 column 
(Protein Plus, Dupont) developed with a water:ace- 
tonitrile gradient containing 0.1% trifluoroacetic 
acid (TFA). The HTV1 protease eluted at approxi- 
mately 55% acetonitrile. Fractions of 1 ml each were 
collected and those containing the protease, as mon- 
itored by SDS-PAGE and silver staining, were col- 
lected and lyophilized. The approximate yield of 
highly purified material from 12 liters of media was 
1 rag. 

The specific activity of the purified protease was 
measured using two synthetic peptides and an 
HPLC-based discontinuous assay. 48 The sequences 
of the peptides were Ala Thr Leu Asn Phe Pro Be Ser 
Pro Trp and Arg Ser Leu Asn Tyr Pro Gin Ser Lys 
Trp. Aliquots of peptide substrates (10>g) were in- 
cubated with purified protease (0.5 \ig) at 37°C for 1 
minute to 24 hours in 50 mM sodium phosphate 
buffer pH 5.5 containing 4 mM EDTA, 20 mM DTT, 
and 25 mM NaCl in a final volume of 20 ul. The 
reaction mixtures were fractionated by reverse- 
phase HPLC on a Vydac C 1B column (250 x 4.6 mm) 
using an aqueous gradient of acetonitrile from 0 to 
100% containing 0.1% TFA and a flow rate of 1 ml/ 
minute. The peptides were detected by UV absor- 
bance at 215 mm. The amount of peptide substrate 
was calculated from the absorption profiles and plot- 
ted versus time. A specific activity was calculated 
from the slope of the curve in the linear range, 

RESULTS 

The expression and secretion of HIV1 protease in 
yeast were accomplished by cloning two different 
DNA fragments encoding the viral enzyme into a 
yeast vector that provided a strong inducible pro- 
moter for expression and the signal peptide and pro- 
cessing signals of the yeast pheromone a-factor for 
secretion. Recombinant plasmids pPR99 and 
pPR179 encode the 99 amino acid protease and a 179 
amino acid protease precursor, respectively (Fig. 1 A 
and B). A 10 kDa polypeptide was detected immu- 
nologically in yeast media from cells expressing ei- 
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the yeasulc or SS^i^I 2) ' and pPR179 < ,ano 3 > <* 

media at 3?c aS»^S MP e ^ W ? e . 9rown in wraciWefidenl 
meaui ai aire. After 48 hours of mcubat on, cultures were omtri. 

TC 9 fand nd B K^ ln VP su ^«ants were pSffirtSKh 
SDS tam f„^r° reS , ed - ,n 8 15% Polyacrytamide gel containing 
5>U!> immunoblol analysis was PBrtormed usino a rahhii ~lk? 

9hl . martter s were loaded in lane 5. PR is Ihe Droteasa 

PRC Is Uie protease plus 25 amino add C-termlnal extension The 
tevel of total secreted protein varied tor eachol the dirt er^Ttran^ 
termed yeast shown in lanes 1 through 4 as deterS S 



therpPR99 (Fig. 2, lane 1) or pPR179 (Fig. 2, lane 
JJ. I his protein comigrates with H1V1 protease ex- 

as shown in lane 6. A 10 kDa 
band is not observed in media from yeast trans- 
formed with a plasmid that does not encode the pro- 
tease (Fig. 2, lane 4). The predicted molecular 
weight of the 99 amino acid polypeptide is 10,792 
winch is in agreement with the observed size as de^ 
termined by its migration in SDS polyacrylamide 
gels relative to proteins with known molecular 
weights. This shows that the protease is expressed 
m yeast from the plasmid construction and indicates 
that the a-factor leader sequence provides an effi- 
cient system to export the viral enzyme from the 
cell, rhe use of a-factor sequences to export recom- 
binant proteins from yeast has been reported for 
other gene products including human epidermal 
growth factor, 24 carboxypeptidase A, 26 and human 
tibroblast growth factor. 3 However, the HIV1 pro- 
tease differs from these proteins m that it is nor- 
mally exported from cells by a mechanism involvin K 
viral budding. B 

The plasmid pPR179 encodes an HIVl protease 
containing 55 additional amino acid residues at its 
N-terminus and 25 additional amino acids at its 
C-terminus. However, a protein band of approxi- 
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mately 10 kDa was detected in media of yeast 
transformed with this plasmid (Fig. 2, lane 3) This 

nrfS " 181 Pr ° teaSe *> rc W 
proteolytic processing resulting in the 10 kDa 

mature protease. This event probably represents an 

2T£ ,y 'T 0CeSSD,ediated b ? the P-tease itself 
since no mature protease was detected in yeast 

superna ants of cells harboring a plasmid encoding 

a variant form of the protease P PR179 D25E (Ffe 2 

ZJ, I "? ,is / ariant ^yme, the active siS 
aspartic acid has been replaced with a glutamic acid 
m order to reduce the activity of the £ot££ ™th 
minimal perturbation of the enzyme sLcTre. ^e 
autoprocessing event that is occurring either along 
the secretory pathway of the yeast or in thf 
extracellular media has also been observed intLel! 
lularly inbacteria. ,0 n i< - ls .«-60 VB Q"iiracei- 

Jl* f C St Supernatant °f ce»s expressing the 
T!? u Pr ? teaSe 3 13 5 kDa P«*ein band was re- 
vealed by the rabbit immune sera. This polypeptide, 

Trmno » PRC - "ay represenT a 124 

amino acid residue intermediate of 13 534 that 
would be formed if the 25 amino acid Germinal 
extension was not remove d while the N-terminal 55 
ammo acid extension was removed. A minor band of 
approx.mately 16.5 kDa was also detected, PRN 
which would result if the C-terminal extension was 
removed but the N-terminal extension was not T 

T£> 5fSf ^ n 6 , t ffiin ° add in ^ediate of 
16 950 Da. The full-length precursor of 19.7 kDa is 

not observed, presumably due to its extreme lability 
Yeast expression of P PRi 7 9 D25E results in the 
presence of the two intermediates PRC and PRN 
suggesting that the Asp to Olu substitution leads to 
a protease w.th reduced, but measurable self-pro- 
cessing activity. F 

acid precursor was cloned in a yeast vector for in- 
ternal location of the protease, the 10 kDa pro- 
tease band was not detected immunologically (re- 
sults not shown . Since no difference in cell giwS 
resulted upon transformation of yeast witt the 

£r«L * XpT ?? Slm vector - th * vinU protease precur- 
sor presumably was degraded by intracellular pro- 
teases Expression of the HIVl protease in yeasfS 
quired fusion of the precursor sequences to the DNA 
sequences encoding hSOD. Analysis of vtaSpSeta 
products expressed in S. cere,«ice strain AB116 
n 2 5Fni ng B i CT P SOD/PRl7 9 or- P S0D/PR179 

S B P r W 6hOWn m FigUTe 3 ™ es * Plaids 
encode a fusion protein of 333 residues witi, a cal- 
culated molecular weight of 35,555. composed of 153 

S- t^ 00 W5th the C - tenn ^l Asn re- . 
179 amino acids of the protease precursor. A 10 kDa " 
polypepUde was detected in Iysates" of yeast cells 
™ntain,„ g the PS0D/PR179 recombinant plasmid 

S h 21 ^ ?*" 10 ™ a Pr ° tein ^^tes 

with the HIVl protease expressed in bacteria (Fig.' 
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Fig. 3- Detection of HIV1 protease expressed in S. c&rqvisiae 
AB116 by immunoblot of yeast total ceH lysates. (A) Immunore- 
aclivlty with rabbit serum containing antibodies against the H1V1 
protease. (B) Immunore activity with mouse monoclonal antibod- 
ies to human superoxide dismutase. Cells grown at 30'C In YEP 
madia supplemented with 2% glucose for A3 hours were lysed and 
processed as described in Materials and Methods. Lane 1 : Pre- 

3A, lane 2). This suggests that the viral proteolytic 
processing occurred within the yeast host to gener- 
ate a protease of correct molecular weight. The 10 
kDa band is not observed in lysates of yeast cells 
transformed with a plasmid that does not encode the 
protease (Fig. 3 A, lane 3). Lysates of yeast cells har- 
boring pSOD/PR179 D25E do not show the 10 kDa 
protein band (Fig. 3A ? lane 6) although a 35 kDa 
band (SOD-PRNC) is detected by antibodies to the 
HIVl protease (Fig. 3A, lane 5) and by antibodies to 
hSOD (Fig. 3B, lane 5). This band probably repre- 
sents a fused hSOD-HIVl protease precursor that 
was not processed by the variant protease. No such 
precursor was detected in cells harboring pSOD/ 
PR179 (Fig. 3A, lane 4; Fig. 3B, lane 4) suggesting 
that this molecule is rapidly cleaved within the 
yeast cell to form the mature protease. A 209 amino 
acid intermediate of molecular weight 22,021 repre- 
senting hSOD bound covalently to HIVl sequences 
located immediately upstream of the amino termi- 
nus of the protease is not observed. Presumably this 
hybrid molecule is very susceptible to the action of 
yeast proteases and is rapidly degraded. 

The yeast expressed HIVl protease was shown to 
be enzymatically active in an in vitro assay that 
uses recombinant myristylated Pr53*°* precursor as 
a substrate. 32 Myristylation is known to occur in a 
number of mammalian retroviral gag precursors 53 



■ • . - •% 

m& " "r'l 

stained molecular weight markers. Lane 2: Bacterial expressed 
HIV1 protease. Lane 3: Lysate of yeast containing pB524.1. Lane 
4: Lysate of yeast expressing pSOD/PR179. Lane 5; Lysate of 
yeast expressing pSOD/PR179 D25E. SOD-PRNC is Ihe 333 
amino acids hybrid precursor consisting of hSOD and HIV1 pof 
amino acid sequences. 



and may be essential for viral polyprotein transport 
to the plasma membrane and subsequent virion 
formation. 52 - 53 Figure 4 A shows that an active HIVl 
protease is present in supernatants of yeast cells 
expressing either pPR99 (Fig. 4A, lanes 5,6) or 
pPR179 (Fig. 4A, lanes 7,8), as evidenced by the gen- 
eration of major capsid (p24) and matrix protein 
(pi 7) that comigrate with their counterparts ex- 
pressed in vivo (lane 1). At least two other minor 
reactive species of approximately 25 and 39 kDa, 
that may represent processing intermediates, 19,54 
were also detected by the AIDS sera. Neither pi 6*** 
nor its putative processed products, nucleocapsid 
protein (NC, p9) and pi*" 8 , were detected, presum- 
ably due to their rapid turnover or low immunoge- 
nicity. The highly purified Pr53 ffa * (Fig. 4A, lanes 
3,4) was nonspecifically cleaved by a yeast protease 
present in the media of yeast transformed with a 
plasmid not encoding the viral protease (Fig. 4A, 
lanes 9,10). The HIVl protease is insensitive to 1 
mM PMSF, which was included in the in vitro as- 
says to avoid nonspecific degradation, and is par- 
tially or totally inhibited by 1 or 10 mM pepstatin A, 
respectively (results not shown). 

When the same protocol is followed to test the en- 
zymatic activity of the internally produced HIVl 
protease (Fig. 4B), essentially identical results are 
obtained. Capsid and matrix proteins are formed 
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Fig. 4. Legend appears on page 332. 
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TABLE I. Processing Sites of HIV1 gag and gaglpol Polyproteins* 



Cleavage site 



Polypeptide junction 


P4 


P3 


P2 


PI 




PI' 


P2' 


P3' 


P4' 


Matrix protein pJ7 (MA) I capsid protein p25/p24 (CA) 


135 
















142 


S 

368 


Q 


N 


Y 


1 


P 


I 


V 


Q 

373 


Capsid protein p25 (internal cleavage)t 


A 

380 


R 


V 


L 


1 


A 


E 


A 


M 
387 


Capsid protein p25 (CA) i pl6 


A 

451 


N 


1 


M 


1 


M 


Q 


R 


G 

458 


pi 6: nucleocapsid protein p9 (NC) i p7 


P 
53 


G 


N 


F 


I 


L 


Q 


S 


R 
60 


Pol precursor | protease (PR) 


S 
58 


F 


N 


F 


I 


P 


Q 


I 


T 

65 


Protease (PR) j protease plO* (internal cleavage)} 


Q 

152 


I 


T 


L 


I 


W 


Q 


R 


P 

159 


Protease (PR) j reverse transcriptase p66 (RT) 


T 


L 


N 


F 


I 


P 


I 


S 


P 


Reverse transcriptase p66 (RT): p51 I pl5 


No information available 










712 
















719 


Reverse transcriptase p66 (RT) I integration 


R 


K 


V 


L 


I 


F 


L 


N 


G 


protein p32 (IN) 





















* Amino acid sequence data are from reference 6. The number above the one-letter code amino acids represent the positions within 
thegag or pol precursors. Cleavage sites were defined on the basis of ami no-terminal amino acid determinations obtained in this work 
as well as on protein sequence information reported previousIy. 6 • l0 ' M * 58 ~ €, 
tThe internal cleavage to generate p24 from p25 occurs at the C terminus of p25. 

J Whether this cleavage is mediated by the viral encoded protease and/or has some biological significance is currently investigated 
in our laboratory. The nomenclature or amino acid residues at the cleavage sites has been reported elsewhere. 8 The standardized 
two-letter notation proposed for retroviral proteins" 4 was followed when possible. 



along the course of the incubation when extracts 
were used from yeast cells transformed with pSOD/ 
PR179 (Fig. 4B, lanes 6,7). Long-term incubations 
resulted in the disappearance of the 39 and 25 kDa 
polypeptides to form the 24 kDa capsid protein and 
the 17 kDa matrix protein (Fig. 4B). These results 
not only illustrate the nature of intermediates of 
those products, but also provide evidence that the 
viral protease can cleave capsid protein p25 at its 



Fig. 4. Immunological analysis ol in vitro processing of the 
H1V1 PrS3P" s precursor by yeast expressed HIV1 protease. (A) 
Secreted protease. One microgram ol myristylated PrSSP 39 
precursor* was incubated at 2CFC with yeast concentrated su- 
pernatants in!30 mM NaCI, 1 mM EDTA, 1 mM PMSF, 10 mM 
Tris-HCI, pH 7.0-7.5. After 48 hours of incubation, samples were 
taken and processed tor immunoblotllng using AIDS patient sera 
HJV1 infected total cell fysate 45 (rane 1). Molecular weight markers 
(lane 2). PrS^ incubated in buffer pH 7.0 (fane 3) or pH 7.5 (lane 
4). Pr5&"* incubated at pH 7.0 (lane 5) or pH 7.5 (lane 6) with 
yeast supernatants of cells expressing pPR99, PrSCr** 5 incubated 
at pH 7.0 (lane 7) or pH 7.5 (lane 8) with yeast supernatants of 
cells expressing pPR179. PrSS 9 * 0 incubated at pH 7.5 (lana 9) or 
pH 7.0 ()ano 10) with yeast supernatant of cells harboring the. 
vector pBS24.1. (B) Intracellurly expressed protease Assays were 
carried out in 130 mM NaCI, 1 mM EDTA, 1 mM PMSF, 10 mM 
Tris-HCI, pH 7.0. at 20*C for 24 or 48 hours. Products of digestion 
were fractionated on 15% potyacrylamide gets containing SDS. 
Immunobtot analysis were performed using AIDS patient sera. 
HIV1 infected ce« total lysate (lane 1). Pr53 pa5 ' incubated for 24 
hours (lane 2) or 48 hours (lane 3) with cell extracts of yeast 
containing the pBS24.1 vector. Pr53 &w incubated in buffer tor 48 
hours (lane A). Molecular weight markers (lane 5). Pr53** incu- 
bated for 24 hours (lane 7) or 48 hours (lane 6) with extracts of 
cells expressing pSOD/PR!79. 



leucine-alanine junction. 6 Cleavage of this peptide 
bond by a synthetic HIV1 protease has, been 
demonstrated. 55 Moreover, removal of the 14 car- 
boxyl residues from capsid protein p25 to yield cap- 
sid protein p24 has also been reported to occur 
within HIVl-infected cells. 64 As expected, no spe- 
cific processing of Pr53* a * was detected when ex- 
tracts were used that contained a plasmid that did 
not express the viral protease (Fig. 4B, lanes 2,3). 
When Pr53*°* was incubated with extracts from 
cellB expressing the variant protease, PR D25E, no 
processing activity was detected (results not shown), 
suggesting that aspartic acid 25 is required for the 
enzymatic activity of the protease on Pr53 ffa ^ pre- 
cursor in the in vitro assay. 

To unequivocally demonstrate that the yeast ex- 
pressed HIVl protease can correctly process my- 
ristylated Pr53 ffc#r , the N-terminal sequence of the 
enzymatically produced capsid protein p24 was de- 
termined. The sequence Pro-lie- Val-Gln-Asn-Leu- 
Gln-Met-Val was obtained and compared with the 
predicted amino acid sequence of the capsid protein 
p24. Both polypeptides show identical residues at all 
positions where the amino acid sequence was deter- 
mined for capsid protein p24, confirming that the 
viral, enzyme expressed and secreted in yeast effi- 
ciently recognized and cleaved the predicted ty- 
rosine—proline peptide bond between the matrix pro- 
tein and capsid protein (Table I). '• 

The scheme chosen for purification of the. viral 
protease takes advantage of two unusual properties 




fraction # 



and loaded on a Rololor unit (Bio-Rarft Th« „u ^ _, 
monds) and the OD /hn»»H hA?.? , J The pH ( d °s«>d dia- 
Ihe 20 fraction obta'Ked } ^tumme,, 'or each o) 



of this protean, namely the high isoelectric point and 
Oie very hydrophobic nature. Consistent with a pre- 
dicted isoelectric point of 9. 95 , the protease mi- 
grated to a region between pH 9 and 12 in a prepar- 
ative isoelectric focusing separation (Fig. 5) Few 
contaminating proteins were found in this pH rantre 

terns have much lower isoelectric points. This appa- 

volume whtch is further concentrated against PEG 
dialyzed, and subjected to reverse-phase HPLC The' 
lumn J V sed was developed using a wateraceto- 
rutnle gradient, and, as expected for a hydrophobic 
protem, the protease eluted at an acetonitrile con- 
centration of approximately 55% (Fig. 6A) Frac- 
tions ehiting between 26 and 29 minutes consisted of 

stZfJT 8 ^ i Pr ° teaSe 38 **** by silver 
staining of an aliquot separated on SDS-PAGE (Fie 

S ; an ^ 7) ; InCUbati0n 0f the P"^ P^se with 
the syntheu, ti(Je A , a ^ ^ 

Ser Pro Trp at P H 5.5 results in the specific- hydrol! 

S 1 1 P I Ptl l e Arg ^ Asn Tyr Pro Gin 
ber Lys Trp under the same conditions riiultsln the 
spw.fic hydrolysis of 54 nmol/minute/mg protease. 
These values are in the range of those observed for 



enzyme preparations of bacterially expressed HTV 
protease on similar substrates. 48 ' 61 
To determine whether the yeast-secreted HIV1 

En Se / CpreEentS 016 " ammo add «sidue ma- 
ture product, ammo-terminal sequence analysis was 
performed on the purified material. Shown in Figure 

LZnif fTF**"** 1 a *W™«* that were de- 
termined for the yeast expressed HTV1 protease 
Identical ammo adds to those of the HIV1 protease* 
were found for the 10 amino-tenninaJ residue^ of S e 

EnsT^ P ? tea T A V«y™™* sequence 
starting at tryptophan 6 was also determined by se- 

^7 an ,l y , SiS mg - U ms 94 ™™ acid poly- 
Pept.de, P 10*(94), which lacks the 5 amino-tenrnnal 
residues may have been generated autocatalyti- 

SaJ^oduT rePreS6nt 3 bi0l0giCa " y SiBnificant 

DISCUSSION 

expression of a virally encoded pro- 
tein from a portion of the viral genome can provide 
reagent levels of the protein for biochemical analy 
ses. Besides avoidmg the biohazard of working with 
large quantities of virus-infected -cells, the exprS 
sion system permits an analysis of the protein usinz 
current methods of molecular genetics. Bacterial 
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Fig. 6. Reverse-phase HPLC of the HIV1 PR99 protease. 6A. 
Samples of protease-containing fractions, isolated by isoelectric 
focusing, were loaded on a C, column. Material was eluted using 
a watenacetonitrile gradient (in 0.1% TFA) of 0-25% for 10 min- 
utes, 25-85% for 40 minutes, and 85-100% for 5 minutes at a 
now rate of 1 ml/min. Elution was monitored at 280 nm. 6B. 
Coomassie Blue-stained 15% potyacrylamide gel containing SDS 



pression systems have been established for biosyn- 
thetic production of the HIV1 protease. l0 - 16 - 48 How- 
ever, internal localization of the protease has 
resulted in toxic side effects of the protease, insta- 
bility of the enzyme to bacterial proteases, solubility 
problems (inclusion bodies), and difficulties associ- 
ated with purifying low levels of a protein from a 
total E. coli extract. These difficulties have frus- 
trated attempts to isolate reagent quantities of the 
HIVl protease and its variants. Attempts to secrete 




JS* 5 -a* 




— PR 



of yeast expression products. Moleculars markers (lane 1). Yeast 
lysate of strain ABU 6 harboring the vector pBS 24.1 with no 
protease insert (lane 2). Yeast lysate of strain AB116 harboring 
the plasmid pSOD/179 (lane 3). Concentrated yeast media of 
strain AB110 harboring pBS 24.1 with no protease insert (lane 4); 
PPR99 (lane 5); P PR179 (lane 6). Purified HIV1 protease, PR99, 
0.4 jig (lane 7). 



the protease from E. coli which may circumvent 
some of these problems have thus far met with little 
success. We therefore took advantage of the eukary- 
otic microorganism Saccharomyces cerevisiae to ex- 
press the retroviral aspartyl protease. S. cerevisiae 
has been used to efficiently express a number of en- 
gineered proteins intraceHularly 19 * 20 * 22 - 24 ' 30 - 32 or 
extracellularly. 21 ' 25 - 27 - 33 When a DNA fragment en- 
coding this HIVl protease was fused in-frame to 
hSOD for internal localization in yeast, a mature 
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Fig. 7. Ammo-terminal sequence analysis lor purified HIV1 

K^ e ?l e ^ equerrcin9 was Panned on an Applied Biosystems 
Model 470A gas-phase protein sequenator equipped with an on- 
One phenylthiohydantoin (PTH) amino acid analyzer. PTH-amino 
acid separatrons were portormed by reverse-phase chromatogra- 
phy on a Brownlee C 1B column. The predicted amino add se- 
quence of the HIVt protease 8 is written using the one-Jetler amino 
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p10 99) is the mature HIV1 protease. p10*(94) is the protease 
lacking the first 5 amino-terminaJ residues. Please 



form of the viral enzyme was detected immuno- 
logically. This result indicates that the autocata- 
lytic event that has been shown to occur in 
bacteria 10 " 12 - 48 can also take place within yeast. A 
hybrid polypeptide consisting of hSOD and HIV1 pol 
amino acid sequences that encode the protease is 
accumulated within the cell when the aspartic acid 
located at the putative active site of the protease 56 
was replaced by a glutamic acid residue (PR D25E). 
This observation provides new evidence that this 
aspartic acid residue is required for the catalytic 
activity of the HIVl protease, as reported by 
others. 13 " 16 . 

In order to express the protease in a more purified 
state, a highly efficient secretion pathway in yeast 
was employed. By fusing the leader sequence of the 
yeast a-factor pheromone to either a 179 amino acid 
precursor of the protease or to the 99 amino acid 
mature form of the protease the viral enzyme was 
exported from the cell. This was verified by immu- 
nological detection of the viral protease in the media 
and by showing enzymatic activity of the secreted 
protease in vitro on the substrate Pr53*°* Ineffi- 
cient but detectable self-processing was observed 
when the active site variant PR179 D25E was se- 
creted resulting in the accumulation of partially 
processed precursors, PRC and PRN. This suggests 
that the variant protease has reduced but measur- 
able autocatalytic activity. However, the possibility 
that such polypeptides are generated by the enzy- 
matic activity of a yeast protease cannot be excluded 
at this time. It is more likely that the glutamic acid 
can etill act as a genera] acid/general base in the 



hydrolysis reaction but is inefficient when compared 
to an aspartic acid. The extra methylene group of 
the Glu presumably places the active site carboxy- 
late out of catalytic register, resulting in a reduced 
activity of the protease. We are currently investigat- 
ing whether secreted PR D25E can process the 
Pr53*°* precursor and various synthetic substrates 
in vitro. 

Secretion of the viral enzyme together with its 
uncommonly high isoelectric point and extremely 
hydrophobic character allowed rapid purification of 
the enzyme using preparative isoelectric focusing 
and reverse-phase HPLC. The media from trans- 
formed yeast was concentrated by ultrafiltration 
and the secreted protease was retained by a 10,000 
molecular weight cut-off membrane. This is consis- 
tent with the predicted homodimeric quaternary 
structure of the protease of molecular weight 20,000. 
The protease has an isoelectric point (pi) of approx- 
imately 10 as judged by the preparative isoelectric 
focusing profile. This is in agreement with the pre- 
dicted pi of the enzyme which was calculated from 
the individual pis of the amino acids. Since the yeast 
secretion vector takes advantage of the natural pro- 
teolytic processing events that occur along the o- 
factor secretion pathway, this expression system can 
be used to overproduce variant forms of the protease 
that do not exhibit catalytic activity. 

Gas phase sequence analysis of the purified pro- 
tease resulted in a primary product that began at 
Prol of the mature enzyme. This result confirmed 
the predicted amino acid sequence that results from 
proteolysis by the yeast KEX2 processing enzyme. 57 
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A sequence lacking the first five amino acids was 
also found by sequence analysis of the purified pro- 
tease. This sequence can be generated by proteolysis 
of the Leu 5-Trp 6 peptide bond and may represent 
an autocatalytic process of the protease. The role 
this proteolysis product plays in the activity of the 
enzyme remains to be determined. The efficient 
yeast expression systems described for the intracel- 
lular or extracellular expression of the HIV1 pro- 
tease will facilitate future biochemical studies on 
authentic or engineered variants of this enzyme. 
The vectors should also prove to be useful for other 
members of the aspartyl protease family associated 
with retroviral polyprotein processing. Finally, it 
may be possible to use yeast secretion of active viral 
protease to develop rapid in vitro screens and/or in 
vivo selections to monitor the efficiency of potential 
inhibitors of the aspartyl proteolytic activity. 
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A synthetic DNA fragment encoding a protease pre- 
cursor of the human immunodeficiency virus type 2 
(HI V2) was cloned and expressed in bacteria and yeast. 
A recombinant plasm id encoding a hybrid polypeptide 
consisting of human superoxide dismutase and an HIV2 
protease precursor of 1 13 amino acids was constructed 
for regulated intracellular expression in bacteria. In- 
duction of this plasmid produced an autoprocessed 
form of the retroviral enzyme possessing the correct 
molecular weight. Overexpression and secretion of the 
protease from yeast was achieved with an expression 
vector encoding the yeast pheromone a- factor signal/ 
leader sequence fused to a protease precursor of 115 
amino acids. Amino-terminal sequence analysis con- 
firmed that the viral enzyme exported from yeast was 
correctly processed from its precursor by cleavage of 
the predicted Ala-Pro peptide bond located at the NH 2 
terminus of the protease in the pol open reading frame. 
No additional amino acid residues were required at the 
COOH terminus of the protease for this autoproteolytic 
event. The H1V2 protease expressed in bacteria and 
yeast was active in an in vitro assay when tested on 
the HIV1 polyprotein precursor, myristylated Pr53 ia *. 
Two synthetic peptides representing junction se- 
quences in the HIV1 gag-pol precursor were used to 
assay purified HIV2 protease. The enzyme exhibited a 
K*JK M of 23.2 min" 1 mM" 1 on the HIV1 matrix-capsid 
junction peptide and a k CB JK M of 71.4 min" 1 mM" 1 on 
the protease-reverse transcriptase junction peptide. 
These rates show that the HIV2 enzyme is efficient at 
hydrolyzing the HIV1 peptide junctions, revealing the 
analogous nature of the substrate specificities of the 
two enzymes. 



The human immunodeficiency virus (HIV) 1 (1) is the etio- 
logical agent of the acquired immunodeficiency syndrome 
(AIDS), a disease that has evolved into a worldwide public 
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health problem. In addition to HIV type 1 (H1V1) (2-4) which 
has been implicated in epidemic AIDS in North America, 
Europe, and Central Africa (5), another retroviral group des- 
ignated H1V2 (6) has been identified primarily in AIDS 
patients from West Africa (5, 7). Nucleotide sequence com- 
parisons along with immunological studies carried out on 
several isolates of HIV2 (6-10) have revealed that this retro- 
virus is more closely related to the simian immunodeficiency 
virus (11, 12) than to HIVL The gag and pol proteins of HIV2 
can be precipitated by antibodies in sera from patients in- 
fected with HIV1 revealing a conservation of antigenic deter- 
minants between the HIV1 and HIV2 proteins. However, the 
gag and pol products of HIV2 have less than 60% identity in 
their amino acid sequences (6) when compared with the 
equivalent H1V1 proteins. Moreover, variation in the size, 
amino acid composition, and proteolytic cleavage junctions is 
also evident when the HIV2 polyproteins are compared with 
their HIVl counterparts. 

The rapid spread of HIV2 (5) underscores the need to 
understand the molecular and structural biology of this retro- 
virus for development of strategies to treat and control AIDS 
infections. One attractive target in the effort to arrest HIV 
replication is the viral protease encoded at the 5' end of the 
pol gene (13). This enzyme plays an essential role in the viral 
life cycle by processing the gag and gag/pol polyproteins to 
the mature structural proteins and enzymes required for vi- 
rion formation. Recent studies of HIVl protease have resulted 
in the elucidation of the three-dimensional structure of the 
enzyme (14-16). These studies confirm that this retroviral 
enzyme is a homo-dimeric member of the aspartyl protease 
family as anticipated by others (17) and permit structure - 
based design of protease inhibitors that may eventually serve 
as antiviral agents (18-20). Although the HIVl and H1V2 
proteases presumably play a similar role in viral replication, 
significant differences exist between the two enzymes. The 
HIVl and HIV2 proteases are both 99 amino acids in length 
but share only 47.5% sequence identity (3, 6) and display 
unique substrate specificities. Moreover, the HIV2 protease 
lacks the 2 cysteine amino acid residues found in the HIVl 
counterpart, does not react with polyclonal antibodies raised 
against the HIVl protease, and is predicted to exhibit a much 
lower isoelectric point. These differences require that an 
independent analysis be carried out on the HIV2 protease. 

The development of rapid and efficient systems to overpro- 
duce a soluble and authentic form of this viral enzyme will 
facilitate further biochemical and biophysical studies. Fur- 
thermore, the availability of in vitro assays for the HIV2 
protease is essential to address specific questions relating to 
the mode of action and specificity of the enzyme and its 
engineered variants. Two recent studies have reported on the 
chemical synthesis (21) and bacterial expression of the HIV2 
protease (22). We have previously reported the expression in 
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bacteria (23) and yeast (24), purification, and initial charac- 
terization of the HI VI protease. Here we demonstrate that a 
precursor form of the HIV2 protease will autoprocess in 
bacteria and yeast to yield a mature and active form of the 
protease. This protease accurately cleaves in vitro the heter- 
ologous substrate, HI VI myristylated Pr53 Kag as well as syn- 
thetic peptides of two junctional sequences of the gag-pol 
precursor. 

MATERIALS AND METHODS 

Strains— Escherichia coli D1210 (25) was used for bacterial expres- 
sion of the H1V2 protease. Saccharomyces cerevisiae AB110 (24) was 
used for yeast expression and secretion of the HIV2 protease. 

Gene and Piasmid Constructions — A 360-base pair synthetic DNA 
fragment was constructed that encoded a 112-amino acid precursor 
of the HIV2 protease from isolate ROD (6). The precursor contained 
13 additional amino acids at the NH2 terminus of the 99-amino acid 
protease and was constructed from 17 overlapping oligonucleotides. 
Viral codons were chosen to construct the coding sequence except at 
Val-10 (GTA— GTG), Thr-12 (ACA—ACC), and Val-71 (GTA— 
GTC). Codons 10 and 12 were chosen to incorporate a unique BsfEII 
site and codon 71 was chosen to incorporate a unique //pall site. The 
endonuclease restriction sites Xba\ t TVcoI, and Bglll were included 
sequentially at the 5' end and a Sal] site was included at the 3' end 
of the synthetic gene. Fifteen of the oligonucleotides used in the 
construction were 45 bases in length and had 22- or 23-base pair 
overlaps. The 5'- and 3 '-terminal oligonucleotides were 27 and 18 
bases in length, respectively (Fig. IA). The DNA construction in- 
cluded a stop codon at the COOH terminus of the protease that 
provided the first G of the So/1 site at the 3' end of the gene. The 
endonuclease restriction sites located at the 5' and 3' ends of the 
DNA segment facilitated the subsequent cloning of the gene into 
appropriate plasmids and expression vectors. The BstKW and HpoM 
sites were introduced to permit dissection of the gene. Oligonucleo- 
tides were synthesized using solid-phase phosphoramidite chemistry 
on an Applied Biosystems 380A DNA synthesizer and were purified 
as described elsewhere (24). Oligonucleotides were phosphorylated, 
annealed, and ligated following standard protocols (26). Equimolar 
concentrations (5 pmo]) of each phosphorylated oligonucleotide were 
mixed together in 2 mM Tris-HCl, pH 8.0, 10 mM MgCl 2 , 10 mM 
dithiothreitol and heated to 90 'C for 2 min in a 1.5-ml microcentri- 
fuge tube in a covered water bath. The temperature control of the 
bath was then shut off and the oligonucleotides allowed to anneal for 
the 4 h needed for the bath temperature to reach 20 °C. The mixture 
was then ligated at 20 *C for 20 min using T« DNA ligase. The DNA 
was precipitated with ethanol, dried, and suspended in water. After 
digestion with the enzymes Xbal and Satt t the 360-base pair DNA 
fragment was purified from a 7% polyacrylamide gel and cloned into 
the Ml3 vector, mpl9. Nine independent clones were isolated and 
purified, single -stranded DNA was harvested, and the insert of each 
isolate was sequenced entirely using the chain-termination method 
(27). Each isolate contained discrepancies when compared with the 
expected sequence. 

One isolate contained three transitions VaI-67 (GTA— »>ATA), Asp 
79 (GAC-»AAC), Asn 83 (AAC— AAT) and one deletion Ala 92 
(GCC-+GC) in the coding region of the mature protease. This isolate 
also contained an insertion/substitution at Arg 5 (AGA— *AAGC). 
Four isolates were identical to the previous isolate except that the C 
deletion at codon 92 was not present. Three isolates contained four 
substitutions in the mature protease, Ala-34 (GCA^GTA), Lys-45 
(AAA—AAC), Val-71 (GTC— *GTA), and Ala-92 (GCC-+GCT). One 
isolate contained two transitions Asp- 79 (GAC-*AAC) and Asn-83 
(AAC— >AAT) in the mature protease and one insertion/substitution 
at Arg-5 (AGA— >AAGC). Although the G to A and C to T transitions 
can be accounted for by guanine modification during chemical DNA 
synthesis (28), no general pattern can be discerned to account for the 
discrepancies which may have resulted from machine failure or re- 
combinant inaccuracies. The isolate with one insertion and two 
single-base substitutions was chosen for repair. The substitution at 
position 83 was a silent mutation and was left uncorrected. Two 
oligonucleotides 5' GGT GCA GCA AGT CCT CTG TTG GTG GCT 
CCC 3' and 5' TGA TTG GGG TGT CGCCTG TCA TTA T 3' 
were used to repair the insertion/substitution at Arg-5 and the 
substitution at Asp-79, respectively. One round of site-directed mu- 
tagenesis using both oligonucleotides and methods previously de- 
scribed (29) was used to obtain the desired gene sequence. The 



repaired DNA was sequenced completely and used for all subsequent 
DNA constructions. 

For internal localization of the HIV2 protease in bacteria, the 
piasmid pSOD/HIV2PRH3 was constructed. The 347-bp Ncol-Saft 
DNA fragment encoding the H1V2 protease was purified and ligated 
to the yVcoI-Safl-digested vector pSODCF2 (30). This vector provided 
the ^-lactamase gene for selection, the CoflEl origin of replication for 
autonomous replication in E. coli and the inducible tac promoter for 
transcriptional control. Piasmid pSOD/HIV2PRH3 encodes a poly- 
peptide consisting of human superoxide dismutase (hSOD) fused to 
a 113-amino acid precursor of the H1V2 protease (Fig. IB). The 
cloning strategy involved replacing the COOH-termina! alanine of 
hSOD with methionine and glycine 72 and threonine 74 of the pot 
polyprotein (6) by an alanine and a leucine, respectively. Transfor- 
mation and selection of recombinants was accomplished using stand- 
ard procedures (26). 

For secretion of the HIV2 protease from yeast, the piasmid 
pHIV2PRH5 was constructed. The 339-bp Bgl\l-Sa{l DNA fragment 
encoding the H1V2 protease was cloned into the pBS24.1 -derived 
piasmid pPRl79 (24). This yeast expression vector contains 2/i se- 
quences for autonomous replication in yeast, the glucose-regulated 
hybrid promoter ADH2/GAPDH, the a-factor terminator to ensure 
transcription termination, and the yeast genes, leu2-d and ura3 for 
selection. The ^-lactamase gene and the Coffil origin of replication 
are also present in this shuttle vector. Piasmid pHIV2PRl35 encodes 
an HIV2 protease precursor of 115 amino acids with its NH2 terminus 
fused to a-factor signal/leader sequences that contain the KEX2 
processing site (31) (Fig. 1C). The HIV2 protease precursor expressed 
from this vector contains the sequence Phe-Arg-Glu-Asp-Leu (resi- 
dues 70-74 of the pol polyprotein) at its NH 2 terminus instead of the 
corresponding sequence Ala-Gly-Gly-Asp-Thr (6) due to genetic ma- 
nipulations required for cloning purposes. Yeast transformation and 
selection of leucine/uracil prototrophs in leucine- and uracil-deficient 
media was achieved essentially as previously described (32). 

Expression of HIV2 Protease in Bacteria and Yeast— E. coli D1210 
cells harboring piasmid pSOD/HIV2PRl 13 were grown for 15-18 h 
at 37 "C in 3 ml of Luria broth (26) containing ampicillin at 40 jig/ 
ml. The overnight culture was used to inoculate 60 ml of M9 minimal 
medium containing ampicillin at 40 jig/ml (26) and cells were grown 
for 2 h at 37 "C with vigorous shaking. Isopropyl-0-D-thiogalactopyr- 
anoside (IPTG) was added at 200 fiM final concentration, and cultures 
were incubated for another 18 h. Aliquots of the induced culture 
containing 0.3 ODeso were collected by centrifugation at 10,000 X g 
for 10 min, and pellets were lysed by repeated boiling and freezing in 
40 fi\ of 63 mM Tris-HCl, pH 6.8, 50 mM dithiothreitol, 10% glycerol, 
3% sodium dodecyl sulfate (SDS), 0.02% bromphenol blue. Proteins 
fractionated by SDS-PAGE in 15% polyacrylamide gels were trans- 
ferred to nitrocellulose filters for 1 h at 75 volts using a Bio-Rad 
Trans-Blot apparatus. The HIV2 protease was identified on immu- 
noblots using rabbit polyclonal antibodies raised against a synthetic 
HIV2 protease 2 as the primary antibody. A goat anti-rabbit antibody 
conjugated to horseradish peroxidase (Tago) was used as the second- 
ary antibody for immunodetection. 

Cultures of S. cerevisiae AB\ 10 cells harboring pHIV2/PRl 15 were 
grown in 50 ml of leucine- and uracil-deficient media for 24-48 h at 
30 °C, and cells were collected by centrifugation at 7000 X g for 20 
min. Proteins in 5 ml of supernatants were precipitated with 2.5 ml 
of 50% trichloroacetic acid containing 2 mg/ml deoxycholate. After 
incubation at 4 "C for 30 min, pellets were collected, washed with 
acetone, dried, and suspended in 20 fx) of sample buffer (33). Proteins 
fractionated by SDS-PAGE were visualized by Coomassie Blue stain- 
ing or transferred to nitrocellulose filters for immunodetection. 

For large scale yeast fermentation and expression of the HIV2 
protease, a 115-liter fermentor was prepared with 68 liters of leucine- 
and uracil-deficient media (32) containing 0.05% (w/v) deoxycholate. 
The fermentor was then inoculated with a 6-liter seed culture of 5. 
cerevisiae AB110 harboring the recombinant piasmid pHIV2PRH5. 
The seed culture had been grown for 24 h in leucine-deficient media 
to ensure a high copy number of the expression piasmid. Culture 
conditions were 30 "C, 5.0 psi and 300 rpm. Cell density, pH, C0 2 
concentration, and glucose concentration measurements were made 
every 2 h throughout the duration of the fermentation run, which 
lasted 72 h. Cell density measurements were made using an in-line 
spectrophotometer set at 650 nm (OD^o). The pH was determined 
with an in-line pH electrode. The concentration of C0 2 was deter- 
mined by infrared spectroscopy using an IR-703 gas analyzer (In- 

2 E. Bradley and 1. Kuntz, unpublished results. 
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frared Industries, Inc., Santa Barbara, CA) mounted on the fermen- 
tation vessel to sample the head space. Glucose concentration meas- 
urements were made using a hexokinase-coupled assay and ultraviolet 
determination of NADH levels at 340 nm. Expression levels of the 
H1V2 protease ai 24, 30, 48, 52, and 72 h after inoculation were 
estimated by immunoblotting as described above. 

Preparation of Bacterial Protein Extracts and Concentrated Yeast 
Supernatants— Bacterial pellets (50 OD^o) of cultures grown for 4 h 
after induction, as described above, were suspended in 1 ml of 50 mM 
Tris-HCl, pH 8.0, 100 mM KC1, 5 mM EDTA, 1 mM phenylmethyl- 
sulfonyl fluoride, 0.5% Triton X-100, and cells were disrupted by 
sonication for 1 min at 4 'C. Lysates were clarified by centrifugation. 
All centrifugations were at 45,000 x g for 15 min. Protamine sulfate 
was added to the supernatant to a final concentration of 0.5% (w/v). 
After incubation at 20 °C for 15 min, the precipitate was removed by 
centrifugation. Ammonium sulfate was added to the supernatant to 
15% of saturation (w/v), incubated at 4 °C foT 1 h t and a pellet was 
collected by centrifugation. Saturated ammonium sulfate was then 
added to the supernatant to a final concentration of 40% (w/v). After 
incubation for 1 h at 20 *C, the pellet was collected, resuspended in 
0.2 ml of Tris-HCl, pH 7.5, and dialyzed against the same buffer 
using a microdialyzer (Bethesda Research Laboratories) with mem- 
branes of 3,500 molecular weight cut-off. 

HIV2 protease expressed and secreted from yeast in volumes less 
than 100 ml was concentrated by ultrafiltration from the media of 
yeast AB110 cells harboring pHIV2PRH5. Cultures grown for 48 h 
were centrifuged to remove the cells, and the supernatant was con- 
centrated approximately 20-fold at 4 °C, using a Centricon 10 (Ami- 
con) membrane with a molecular weight cut-off of 10,000. 

Protein Purification — The HIV2 protease was purified from the 
media of yeast cultures expressing plasmid pHIV2PRH5 grown 72- 
96 h. During the initial stages of purification, the protease was 
monitored by immunoblots using polyclonal rabbit antibodies raised 
to a chemically synthesized HIV2 99 amino acid peptide. Following 
centrifugation of the cells, the culture supernatant (5 liters) was 
adjusted to 1 M ammonium sulfate, 20 mM sodium phosphate, pH 
8.0, 1 mM EDTA and 1 mM phenylmethylsulfonyl fluoride. This 
material was loaded on a phenyl-Sepharose column (5 x 30-cm, 
Pharmacia LKB Biotechnology Inc.) pre-equilibrated with the same 
buffer. The column was washed with a linear gradient of 1-0 M 
ammonium sulfate in buffer followed by two column volumes of water. 
The protease, which eluted in the later fractions of the water wash in 
approximately 250 ml, was further purified by preparative isoelectric 
focusing on a Rotofor unit (Bio-Rad). A mixture of 1 ml, pH 3-10, 
Ampholytes and 0.25 ml, pH 5-8, Ampholytes was added to 55 ml of 
protease sample for each of five Rotofor fractionations. The fractions 
in the range of pH 5.0-6.5 were pooled. These samples were further 
purified by reversed -phase HLPC using a preparative C3 column (Pro 
10/300 Protein Plus, Du Pont, 4.6 x 21.2 cm) and eluted with a linear 
gradient of 25-85% acetonitrile/water in 0.1% trifiuoroacetic acid 
over a period of 60 min at a flow rate of 10 ml/mi n. The protein 
eluted at approximately 60% acetonitrile and was lyophilized and 
stored at —20 *C. The resulting enzyme is 95% pure as judged by 
NHj-terminal sequence analysis as well as silver staining of SDS- 
PAGE gels. For enzymatic analysis, the lyophilized enzyme was 
denatured in 0.05 mM sodium acetate, pH 5.5, containing 8 M urea, 
and refolded by a 10-fold dilution into 0.2 M sodium acetate, pH 5.5, 
containing 1 mM EDTA, 5 mM dithiothreitol, 10% glycerol, and 5% 
ethylene glycol. This procedure has been shown to restore enzymatic 
activity to preparations of HIVl enzyme that have become inactive 
due to denaturation (34). 

In Vitro Assay for HIV 2 Protease on Pr53?°* — A recombinant form 
of the natural substrate Pr$3 w has previously been expressed in yeast 
(35). The myristylated polyprotein was purified and shown to be a 
substrate for the HIVl protease (24). In a similar fashion, we moni- 
tored the activity of the HIV2 protease expressed in bacteria and 
yeast on myristylated Pr53* M . The heterologous substrate (1 /ig) was 
incubated in 10 mM Tris-HCl, pH 7.0, containing 130 mM NaCI, 1 
mM EDTA, and 1 mM phenylmethylsulfonyl fluoride with the con- 
centrated yeast media or partially purified bacterial lysates. After 8 
h of incubation at 25 "C, the reaction products were analyzed by 
SDS-PAGE in 15% polyacrylamide gels and immunoblotted using 
serum from AIDS patients as the primary antibody. The serum was 
inactivated in a Biosafety Level 3 facility by heating at 56 "C for 35 
min, treating with psoralen at 25 >ig/ml final concentration, and UV 
irradiation on ice. Visualization of the specific bands on the immu- 
noblots was achieved with goat anti- human antibodies conjugated to 
horseradish peroxidase (Tago) as the second antibody. Lysates of 



HIVl-infected ceils (36) were included as markers to visualize HIVl 
viral proteins. The lysates were inactivated in a Biosafety Level 3 
facility by treating with 0.5% Triton X-100 final concentration. 
Pepstatin A was used at 10 mM final concentration in some assays to 
inhibit the activity of the retroviral protease. 

In Vitro Assay for HIV2 Protease on Synthetic Peptide Substrates— 
The HIV2 protease was assayed against the decapeptide, Ala-Thr- 
Leu-Asn- Phe-Pro -Ile-Ser-Pro-Trp and the octapeptide, Ser-Gln- 
Asn- Tyr-Pro -Ile-Val-Gln. The decapeptide corresponds to the HIVl 
carboxyl- terminal autoprocessing site, and the octapeptide corre- 
sponds to the HIVl matrix-capsid cleavage site (2). The peptides 
were synthesized using conventional solid-phase methods. Concen- 
trations of the enzyme stock solutions were established by titra- 
tion with the substrate -based inhibitor Val-Ser-Gln-Asn-Leu^ 
(CH(OH)CH 2 )Val-Ile-Val (34). Reactions were carried out in 0.1 ml 
of 0.1 M sodium acetate buffer, pH 4.7, containing 2 mM EDTA, 1 M 
NaCI, and 0.05-5 mM peptide substrate. The effect of salt and pH on 
enzyme activity was tested by using 0.1 M sodium acetate buffer at 
either pH 4.7 or 5.5 and containing either 0.25 or 1 M NaCI. Typically, 
1-3 X 10" 4 mg of HIV2 protease was added to initiate the reaction, 
which was stopped after 2 h at 37 °C by addition of 50 /il of cold 0.1 % 
trifiuoroacetic acid on ice. Conditions were adjusted so that <25% of 
the substrate was hydrolyzed during the incubation. Reaction prod- 
ucts were separated by reversed-phase HPLC using a d& Pecosphere 
(0.46 X 3.3 cm, Perkin-Elmer) column. Products of the decapeptide 
(Ala-Thr-Leu-Asn-Phe and Pro-Ile-Ser-Pro-Trp) were resolved with 
a gradient of 10-50% acetonitrile in 0.1% trifiuoroacetic acid, while 
the octapeptide fragments (Ser-Gln-Asn-Tyr and Pro-lie- Val-GIn) 
were observed with a gradient of 0-35% acetonitrile. Absorbance was 
monitored at 280 nm, and hydrolysis of the peptides was quantitated 
by integration of the peak areas and comparison to product standard 
curves. Each data point was measured in triplicate. The program 
Enzfitter was used to fit data to Michaelis-Menten kinetics (37). 

Amino-terminal Sequence A nalysis— The retroviral protease in 20 
ml of yeast supernatant was obtained from yeast AB110 cultures 
expressing the recombinant plasmid pHIV2PRH5. Proteins were 
precipitated with trichloroacetic acid, fractionated in a 15% polyacryl- 
amide gel containing SDS and transferred electrophoretically to 
polyvinylidene difluoride membranes (Immobilon-P Transfer Mem- 
brane, Millipore) in 10 mM CAPS, pH 11.0, 10% methanol. Strips of 
polyvinylidene difluoride membranes were excised and the protease 
subjected to microsequence analysis. Automated Edman degradation 
chemistry was performed using a 470A Applied Biosystems gas-phase 
sequencer equipped with an on-line phenylthiohydantoin amino acid 
analyzer. Phenylthiohydantoin -amino acids were separated by re- 
versed-phase chromatography on a Brownlee C 18 column and identi- 
fied as described previously (38). 

RESULTS 

Bacterial expression of the H1V2 protease was achieved by 
cloning a synthetic DNA fragment (Fig. L4) encoding the 
retroviral enzyme into a bacterial expression vector derived 
from pSOD/PRl79 (23). The resulting recombinant plasmid 
pSOD/HIV2PRH3 (Fig. IB) encodes a polyprotein contain- 
ing a 153-amino acid human superoxide dismutase and an 
HIV2 protease precursor of 113 amino acids. The hSOD 
sequences are essential for expression of the 113-amino acid 
protease precursor since a similar construction lacking the 
hSOD sequences resulted in no detectable HIV2 protease. 
The viral protease precursor contains an NH 2 -terminal exten- 
sion of 14 amino acids to the 99-amino acid protease. There 
is no COOH-terminal extension since a stop codon was placed 
immediately downstream of the COOH-terminal leucine of 
the protease. The expression of the fusion protein was con- 
trolled by 1PTG induction of the tac promoter located 5' to 
the hSOD gene. Protease expression was monitored in total 
cell lysates by SDS-PAGE and immunoblotting. An immu- 
noreactive band of approximately 10 kDa (PR) can be ob- 
served within 0.25 to 0.5 h of induction using rabbit polyclonal 
antibodies raised against a synthetic H1V2 protease (Fig. 2, 
lanes 2 and 3). The 10-kDa band was not detected using rabbit 
polyclonal antibodies raised against the HIVl protease. The 
10-kDa species is in agreement with the 10,72 1-dalton pre- 
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Fig. 1. Gene construction and recombinant plasmids for the expression of the HIV2 protease. A, 
synthetic HIV2 protease gene. Schematic representation of the HIV2 protease synthetic gene. A 360-bp XbahSatl 
DNA fragment was constructed using 17 overlapping synthetic oligonucleotides of 18, 27, and 45 nucleotides in 
length- The Xbal, Bglll, and Sail restriction sites were included to facilitate subcloning of the final construction. 
The BsrEII and HpaXl restriction sites present in the coding region of the 99-amino acid protease (hatched) were 
introduced to permit dissection of the gene. B, bacterial expression plasmid pS0D/HIV2 PR113. A 339-bp synthetic 
Bgdl'Sall DNA fragment encoding the 99-amino acids H1V2 protease and containing 13 additional residues at its 
NHt terminus was cloned into the plasmid pSOD/PRl79 (23). The resulting plasmid encodes a hS0D/HIV2 
protease hybrid precursor whose expression is under the control of the tac promoter. C, yeast expression plasmid 
pHIV2PRll5. The BglU -Sail restriction fragment described above was cloned into the plasmid pPRl79 (24) that 
provided the yeast o-factor signal/leader sequence containing the KEX-2 processing site to promote secretion of 
the retroviral enzyme from the cell. The HJV2 recombinant plasmid also contains the ADH2/GAPDH promoter, 
the o-factor terminator, 2y yeast sequences and the genetic markers leu2-d, ura3, and 0-lactamase. 



dieted mass of the mature protease. This indicates that the 
H1V2 protease is capable of autoprocessing within the bacte- 
rial host as previously reported for the HIV1 protease (23, 
39-41). Another major band of approximately 30 kDa (SOD/ 
PR) was also detected by the antibodies to the HIV2 protease. 
This band is also detected by a mouse monoclonal antibody 
raised against hSOD (results not shown), and probably rep- 
resents the 27,865-kDa SOD/HI V2 protease fusion containing 
the unprocessed viral enzyme. As expected, the protease and 
the hybrid precursor were not observed in the uninduced 
cultures. 

Expression levels of the hybrid polypeptide and the mature 



protease reach their maximum levels at approximately 3-4- 
and 5-6 h post-induction, respectively (Fig. 2, lanes 8-U). 
Expression levels of both proteins are greatly reduced after 
18 h suggesting that these proteins are susceptible to degra- 
dation by E. colt proteases. Increasing levels of the 30-kDa 
precursor band do not result in a concomitant increase of the 
10-kDa protease band suggesting that release of the protease 
from the hybrid polypeptide is slow relative to the proteolytic 
turnover of the viral protease. The HIV2 protease is com- 
pletely soluble in this expression system and at maximum 
expression levels is approximately 0.1% of the total cellular 
proteins. 
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Fig. 2. Immunodetection of HIV2 protease expressed in E. 
coti D 1 2 1 0, Cells harboring plasmid pSOD/HIV2PRl 13 were grown 
in 60 ml of M9 minimal media (26) for 2 h, and cultures were induced 
by the addition of IPTG at a final concentration of 0.2 mM. Samples 
were removed at different t imes post -induction and 0.3 ODaw of cells 
were lysed as described under "Materials and Methods." Proteins 
fractionated by electrophoresis in a 15% polyacrylamide gel contain- 
ing SOS were transferred to nitrocellulose niters. Immunoblot analy- 
sis was performed using rabbit polyclonal sera raised against a syn- 
thetic H1V2 protease. The column numbers indicate hours after 
induction with IPTG. PR is the protease: SOD/PR is the hybrid 
precursor. Prestained proteins were loaded as molecular weight stand- 
ards {column Af). 
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Fig. 3. Detection of HI V2 protease in yeast super na tan is of 
S. cerevisiae AB110. A f Coomassie Blue-stained proteins. B, im- 
munoreactivity with rabbit sera containing antibodies to a synthetic 
H1V2 protease. Cells harboring pHlV2PRH5 were grown in 50 ml of 
leucine and uracil deficient media at 30 C C. After 24 {lane 2) or 48 h 
(lane 3) of incubation, 10-ml aliquots of the cultures were pelleted by 
centrifugation and proteins in 5 ml of yeast supernatants were pre- 
cipitated with trichloroacetic acid and fractionated by SDS-PAGE in 
a \5% polyacrylamide gel containing SDS. Immunoblot analysis was 
performed as described in Fig. 2. Trichloroacetic acid-precipitated 
proteins of yeast supernatants of ABl 10 cells harboring the parental 
plasmid pBS24.1 (24) flre shown in tone 4. PR is the mature HIV2 
proteose. 

To avoid the problems associated with intracellular prote- 
olysis in bacteria as well as to facilitate its purification, the 
protease was expressed and secreted in yeast. Yeast expression 
of the HIV2 protease was accomplished by fusing the syn- 
thetic DNA fragment encoding the HIV2 protease to DNA 
encoding the or-factor signal/leader sequences that contain 
the KEX2 recognition cleavage site (Fig. 1C). The glucose- 
regulatable hybrid promoter ADH2/GAPDH (42) is used to 
control expression of the cloned gene. The a- factor terminator 
(43) is included to ensure transcription termination. The 
plasmid pHIV2PR115 encodes an HIV2 poi precursor of 1 1 5 
amino acids with a predicted molecular weight of 12,435. A 
protein band of approximately 10 kDa (PR) is observed by 
Coomassie Blue staining (Fig. 3/L lane 3) and immunoblotting 
(Fig. ZB, lane 3) in yeast supernatants of cells transformed 
with pHIV2PRll5, indicating autoprocessing of the precur- 
sor. The viral protease is not observed in yeast supernatants 
of cells grown for 24 h {A and B, lane 2) since glucose 
consumption and the resultant induction of the ADH2/ 



GAPDH promoter takes place after 28-30 h of growth under 
the culture conditions used (see below). Maximum levels of 
expression (0.8-2.0 mg/liter of media) are observed after 48- 
72 h. The HIV2 protease proved to be stable at 30 "C in the 
yeast media for up to 6 days as judged by immunoblotting and 
enzymatic analyses (results not shown) and represents a 
predominant yeast-secreted protein. This protein is not seen 
in yeast supernatants of cells harboring the parental plasmid 
pBS24.1 (A and B, lane 4). 

The NH 2 terminus of the yeast- secreted HIV2 protease was 
sequenced to unequivocally demonstrate that the 10-kDa 
product represents the correctly processed, mature, retroviral 
protease. The first 13-amino acid residues displayed the 
sequence Pro-Gln-Phe-Ser-Leu-Trp-Lys-Arg-Pro-Val-Val- 
Thr-Ala which is identical to that predicted for the HIV2 
protease (6). This confirms that the viral enzyme secreted 
from yeast was correctly released from the precursor by cleav- 
age at the Ala-Pro junction. 

To determine whether the recombinant H1V2 protease ex- 
pressed in bacteria and yeast represents an active form of the 
viral enzyme, its ability to correctly cleave the HI VI polypro- 
tein substrate, myristylated PrSS* 112 , was established Both the 
bacterial- (Fig. 4, lane 2) and yeast- {lane 6) expressed protease 
can accurately process the HIVl protein precursor as judged 
by the generation of capsid protein (p24) and matrix protein 
(pi 7) along the course of the incubation. Another protein 
band of approximately 6 kDa {p6 or p7) that probably repre- 
sents a processing product of nucleocapsid protein (pl5) (44) 
was also detected by the human AIDS serum. This protein 
had remained undetected when similar experiments were per- 
formed with the HIVl protease (24) due probably to a low 
titer of antibodies against this protein. These processed spe- 
cies are indistinguishable from those generated by a purified 
HIVl protease expressed in yeast {lane 3) (24) and comigrate 
with their counterparts expressed in vivo (lane 4). Bacterial 
extracts and yeast supernatants that do not contain the HIV2 
protease do not generate specific viral proteins when incu- 
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Fid. 4. Immunoblot analysis of in vitro processing of HIVl 
Pr53 K1Mt polyprotein by bacterial and yeast expressed H1V2 
protease. For each in vitro assay, approximately 1 fig of myristylated 
Pr53 w precursor was incubated at 25 °C with bacterial extracts or 
concentrated yeast supernatants as described under "Materials and 
Methods." Immunoblot analysis was performed using AIDS patient 
sera as the primary antibody. Lane I, Pr53 Mt incubated with bacterial 
extracts of cells harboring pSOD/HlV2PRH3 before induction; lane 
2, PrSS'"* incubated with bacterial extracts of cells harboring pSOD/ 
HIV2PR113 4 h after induction; lone 3. Pr53 w incubated with puri- 
fied HIVl protease expressed in yeast (24); lane 4, a total cell lysate 
of HIVl infected cells (36) as reference for viral proteins; lane 5. 
Pr53 w incubated with concentrated yeast supernatants from yeast 
cells harboring pBS24.1; lane 6, Pr53 w incubated with concentrated 
yeast supernatants from yeast cells harboring pHIV2PRU5; lane 7, 
prestained molecular weight markers. 
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bated with Pr53 K<18 . Nonspecific proteolysis of the HIVl po- 
lyp rotein precursor (lanes J and 5) is presumably caused by 
endogenous bacterial and yeast proteases. Processing of 
PrbZ™ by the HIV2 protease was completely abolished when 
10 mM pepstatin was included in the assay (results not 
shown). This observation provides evidence that the HIV2 
protease is an aspartyl protease as already established for the 
HIVl enzyme (15). 

The HIV2 protease was purified from yeast supernatant s 
using a three-step purification procedure. Fig. 5 shows repre- 
sentative samples from the various stages of the protocol. We 
estimate that the initial concentration of the protease is 
approximately 0.8 mg/liter of yeast supernatant {lane 2). The 
subsequent steps involving phenyl-Sepharose chromatogra- 
phy, preparative isoelectric focusing fractionation, and re- 
versed-phase HPLC result in homogeneous enzyme (lanes 3- 
5). The phenyl-Sepharose column, which mimics the hydro- 
phobic peptide substrate, is used to concentrate the enzyme 
from the large volume of yeast media. The protease-contain- 
ing fractions eluted from the column are subjected to prepar- 
ative isoelectric focusing on a Rotofor unit to remove high 
molecular weight proteins and melanin polymers of hetero- 
geneous size. Migration of the HIV2 protease to its expected 
isoelectric point of approximately 5.5 separates the enzyme 
from the negatively charged polymeric contaminants which 
migrate as a dark brown band in the pH range of 1-3. 
Following removal of the Ampholytes by dialysis, the pH 5.0- 
6.5 Rotophor fractions are loaded onto a preparative C 3 col- 
umn and the protease is eluted as a single peak at approxi- 
mately 60% acetonitrile; this is in agreement with its expected 
hydrophobic nature. The single band shown in Fig. 5, lane 5, 
is 95% pure as judged by amino acid analysis and silver 
staining (data not shown). The overall yield of the protein is 
between 10 and 20%. The purified protein is stable to rapid 
freeze/thaw cycles and storage at -20 *C in 10% glycerol, 5% 
ethylene glycol buffer. However, concentrated enzyme solu- 
tions (0.25 mg/ml) kept at 0 °C can lose activity, presumably 
due to autoproteolysis. For cases where enzyme activity is lost 
due to denaturation, activity can be restored by refolding the 




Fig. 5. Purification of secreted HIV2 protease from yeast. 
Samples from various steps of the purification procedure were run on 
a 17.5% polyacrylaraide gel containing SDS and stained with Coo- 
massie Blue. Samples were precipitated with trichloroacetic acid/ 
deoxycholate as described under "Materials and Methods" to reduce 
the volume of the sample. Lane J. molecular weight markers; tone 2, 
yeast culture supernatant (5 ml, 0.1% of total); Lane 3, pooled, pro- 
tease-containing fractions from the phenyl-Sepharose column (0.2% 
of total); lane 4, preparative isoelectric focusing fractions migrating 
in the pH 5.0-6.5 range (0.6% of total); lane 5, purified protease (3.5 
fig, 0.8% of total) isolated by reversed-phase chromatography on a 
preparative HPLC column. 



enzyme using a procedure similar to that used for HIVl 
protease (34). The purified enzyme was assayed against the 
HIVl matrix -capsid octapeptide Ser-Gln-Asn- Tyr-Pro -IIe- 
Val-Gln to yield a specific activity of 3.0 pmol/min/mg. This 
value is comparable to that obtained for hydrolysis of the 
same substrate by purified HIVl protease (45). 

The kinetic properties of the purified HIV2 protease were 
evaluated using the two synthetic peptide substrates Ala-Thr- 
Leu-Asn- Phe-Pro -lle-Ser-Pro-Trp and Ser-Gln-Asn- Tyr- 
Pro -IIe-Val-Gln. To ensure accuracy in all enzymatic anal- 
yses, active site titrations using the substrate-based inhibitor 
Val-Ser-Gln-Asn-Leu*(CH(OH)CH 2 )Val-Ile-Val (34) were 
used to determine the concentration of enzyme stock solu- 
tions. Evaluation of two buffer conditions commonly used for 
HIVl protease assays showed that an 80% increase in enzyme 
activity on either peptide substrate resulted by increasing the 
salt concentration from 25 mM to 1.0 m. A further increase in 
enzyme activity of approximately 10% resulted by lowering 
the pH from 5.5 to 4.7. The optimal conditions of high salt 
and low pH were used for determining the kinetic constants 
of the octapeptide and decapeptide substrates. The enzyme 
exhibited a k cel of 65 min" 1 and a Km of 2.8 ± 0.5 mM on the 
octapeptide substrate and a of 10 min -1 and a Km of 0.140 
± 0.045 mM on the decapeptide substrate (Fig. 6). 

The behavior of yeast expressing the HIV2 protease in large 
scale fermentation conditions was evaluated to establish base- 
line growth and production profiles. Sixty -eight liters of 
pHIV2PR115 transformed yeast were grown in a 115-liter 
vessel. Results shown in Fig. 1A reveal that the yeast culture 
remains in lag phase for approximately 15 h until an increase 
in the cell density and a simultaneous decrease in the pH is 
observed. The stationary phase is reached after approximately 
65 h. Glucose utilization by the yeast resulted in a sharp 
evolution of COj production from 20 to 28 h post- inoculation 
(B), A concomitant shift in pH was also noted during this 
time interval (A). As expected, the HIV2 protease was ex- 
pressed upon depletion of available glucose and was detected 
immunologically after 30 h (C, lane 4). Maximal expression 
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Fig. 6. Double-reciprocal plot of HIV-2 protease activity 
versus concentration of peptide substrates ATLNFPISPW (■) 
and SQNYPIVQ (O). Either 2.5 x 10" mg (■) or 9.9 x 1(T 5 mg (O) 
HIV-2 protease was incubated with varying amounts of substrate for 
2 h at 37 "C in 0.1 m sodium acetate buffer, pH 4.7, containing 1 M 
NaCl and 2 mM EDTA. Enzyme activity was measured as described 
under "Materials and Methods." Data were fit to the Michaelis- 
Menten equation and kinetic constants were calculated using a non- 
linear regression program ("Enzfitter" from Biosoft). 
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Fig, 7. Fermentation profiles of yeast expressing and se- 
creting HIV2 protease. A, cell density (OD«q) and pH profiles. B, 
percent glucose concentration and percent CO? concentration. C, 
expression levels of secreted HIV2 protease from yeast. The H1V2 
protease in 4 ml of yeest supernatant was precipitated with trichlo- 
roacetic acid/deoxycholate, fractionated by electrophoresis in a 15% 
polyacrylamide gel containing SDS and transferred to nitrocellulose 
filters for immunoblot analysis as described under "Materials and 
Methods." Protein molecular weight markers, lane 7; media from 
pBS24.1 transformed yeast, 72 h, lane 2; media from pHIV2PR115 
transformed yeast, 24 h, lane 3; 30 h f lane 4\ 4$ h, lane 5; 52 h T tone 
6; 72 h, lane 7. 



levels were obtained at 72 h (C, lane 7) and the growth was 
terminated due to the increasing pH of the media. 

DISCUSSION 

The emergence and rapid spread of HIV2 as another caus- 
ative agent of AIDS (5, 6) raises the challenge to better 
understand the molecular biology of this retrovirus. Since the 
protease is essential for viral replication and represents an 



attractive target for anti-AIDS therapeutics we have cloned 
and expressed the HIV2 protease in bacteria and yeast. This 
should facilitate the purification of reagent quantities of pro- 
tein for further biochemical and biophysical studies. In addi- 
tion, the expression systems allow us to study the autopro- 
cessing event as well as the proteolytic activity of the mature 
protease on the heterologous substrate HIVl Pr53 F **. 

The HIV2 protease is capable of autoprocessing in bacteria 
to form a mature and active form of the enzyme when ex- 
pressed as part of a human superoxide dismutase fusion 
protein. The protease in the hS0D/HIV2 polyprotein has 
only a 14-amino acid NHo-terminai extension and does not 
contain additional amino acids at its COOH terminus. The 
mature HIV2 proteose was completely soluble, and the onset 
of degradation could be monitored after 4 h of induction. The 
autoprocessing event appears to be less efficient than that 
observed for the hSOD/HIVl protease precursor (23) since a 
large percentage of the HIV2 protease remains in the hybrid 
polypeptide throughout an 18-h time course of expression. 
Although the lack of a COOH-terminal extension did not 
prevent autoprocessing of HIV2, it may have contributed to 
the inefficiency of the process. Analogous amino acid se- 
quences to those found in the hSOD/HIVl polypeptide (23) 
may be required in the hSOD/HIV2 polypeptide for efficient 
autocatalysis from the polyprotein precursor. 

When expressed and secreted in yeast, the H1V2 protease 
was correctly released from a 115-amino acid pol precursor 
lacking a COOH-terminal extension. This observation can 
also be extended to the HIVl protease as well since a 154- 
amino acid HIVl pol precursor that does not contain addi- 
tional amino acid sequences at the COOH-terminus of the 
protease efficiently self -processes in yeast. 3 It has been pos- 
tulated that the HIV protease undergoes autocatalytic acti- 
vation when expressed as part of a higher molecular weight 
precursor and that this activation is initiated at the COOH 
terminus of the protease (40). The mature HIVl and HIV2 
proteases that we have obtained using yeast expression plas- 
mids encoding HIV precursors with no additional amino acid 
sequence at their COOH termini, support this idea and shows 
that the autocatalytic event does not require that the COOH- 
terminal amino acids be present for processing the NH 2 * 
terminal site. 

The HIV2 protease monomer is predicted to be a 99-amino 
acid protein that is released from the gag/pol polyprotein 
after cleavage of NH 2 -termLnal Ala-Pro and COOH-terminal 
Leu- Pro peptide bonds. The arnino-terrninal sequence that 
we determined for the yeast -expressed HIV2 protease repre- 
sents the first direct evidence that the potential Ala -Pro 
cleavage site is recognized and hydrolyzed by the HIV2 en- 
zyme during autocatalysis. Autoprocessing of the HIVl pro- 
tease at the analogous Phe-Pro cleavage sites has been re- 
ported (23, 24, 39-41). Mass spectral analysis of purified HIV2 
protease provides additional evidence that the mature HIV2 
protease is correctly processed to form the 99-amino acid 
enzyme monomer. 4 

To determine whether the bacterial and yeast-expressed 
HIV2 protease represents an active viral product with authen- 
tic enzymatic activity, the pattern of proteolysis using purified 
HIVl Pr53 e " c precursor as a substrate (35) was determined. 
Processing of murine leukemia virus and feline leukemia virus 
gag precursors by a heterologous retroviral protease has been 
reported (46-48). Both the bacterial and yeast-expressed 

1 S. Pichuantes, L. M. Babe, P. J. Barr, D. L- DeCamp, and C. S. 
Craik. unpublished results. 

*S. Kaur, L. M. Babe, D. L. DeCamp, A. Burlingame, and C. S. 
Craik, unpublished results. 
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HIV2 protease process the HIV1 gag polyprotein to yield a 
product pattern that is indistinguishable from that generated 
by an HI Vl protease expressed in yeast. The migration of the 
proteins generated in vitro is also identical to the migration 
of structural proteins from HIVl virions. This shows that the 
HIV2 protease expressed in both the bacterial and yeast 
systems are enzymatically active and capable of authentic 
processing of the HIVl gag precursor in vitro. 

The HIV2 gag polyprotein has a calculated molecular 
weight of 57,100 and is 20 amino acids larger than its HIVl 
counterpart (3, 6). The HIVl and HIV2 gag precursors share 
58% identity in their amino acid sequences. Processing of the 
HIVl Pr53 KaK by the HIV2 protease suggests that the HIVl 
and HIV2 gag polyp roteins adopt similar conformations that 
allow the recognition and hydrolysis of a select number of 
peptide bonds within the polyprotein precursor. Synthetic 
peptides with sequences of the HIVl matrix-capsid junction 
and the protease- reverse transcriptase junction were shown 
to be substrates for the HIV2 protease in quantitative assays 
using purified protease whose concentration was verified by 
active site titration. This permits an accurate comparison of 
the specificities of the HIVl and H1V2 enzymes on the same 
substrate. The HIV2 protease hydrolyzes the octapeptide 
substrate with a k CBt of 65 min" 1 and a Km of 2.8 mM. The 
similarity of these kinetic parameters to those described for 
the HIVl protease on the identical substrate (A eat — 78 min" 1 , 
K M = 1.5 mM) (45) shows the analogous substrate specificity 
of the two enzymes and confirms the observation of efficient 
HIV2 protease processing of Pr53 gee , in vitro. A detailed 
analysis of the HIVl and HIV2 substrate specificities on 
various synthetic peptides is described elsewhere (49). These 
results suggest that active site -directed inhibitors currently 
being developed against the HIVl protease may also serve as 
effective inhibitors of the HIV2 protease. Preliminary results 
using the purified HIV2 protease and synthetic inhibitors 
support this proposal (49). 

The expression systems described here are shown to be 
useful in the production of a mature, active HIV2 protease. 
These systems may also prove to be useful for the expression 
of variant forms of the HIV2 protease as well as related 
aspartyl proteases. The large scale production of active HIV2 
protease achieved by using the yeast expression system will 
facilitate its purification as well as permit the development of 
sensitive and efficient in vitro assays for this protein. The 
availability of reagent levels of homogeneous material will 
also permit biophysical analysis of the retroviral enzyme to 
provide a better understanding of structure/function relation- 
ships and aid in the rational design of inhibitors and eventual 
antiviral pharmaceuticals. 
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SDS Polyacrylamide Gel Electrophoresis 

A related technique is polyacrylamide gel electrophoresis, wherein the 
rate of migration through a gel with small pores is dependent upon the 
size of the protein. As with the other techniques discussed, the shape of 
the protein will also determine the rate of migration. An additional factor 
in this case is the net charge of the protein, since electrophoresis is the 
driving force. These two factors may be determined by varying the sieving 
effect of the gel, achieved by altering the concentration of acrylamide and 
the cross-linking agent, methylene-ta-acrylamide; all other parameters are 
kept constant so that the shape and net charge do not vary, 

A much more satisfactory approach is to abolish all shape and charge 
differences of proteins by adding the denatura nt sodium dodecyl sulfate 
(SDS): ~~ 

CH 3 — (CH 2 )„— S0 4 "Na + (1-58) 

This approach has arisen from a series of empirical observations, so that 
its detailed physical basis is not yet fully understood. Binding studies of a 
variety of different proteins have shown that above an SDS monomer 
concentration of 8 x 10~ 4 M, a constant of 1.4 grams of SDS is bound 
per gram of protein* i.e., one molecule of SDS for every two amino acid 
residues of the chain. This high level of binding of the charged detergent 
and the constant binding ratio will generally "swamp out" the intrinsic 
charge contribution of most proteins, so that a n approximately constan t 
negatjve^ha rge per unit mass will be obtained. All polypeptides also ap- 
pear to havea similar shape when SDS is bound, generally considered as 
elongated particles, with a constant diameter and a length proportional to 
the riumber of amino acid residues in the polypeptide chain. The exact 
nature of the protein-SOS complex is not known; none of the models 
proposed is entirely consistent with the many experimental observations. 

Nevertheless, the constant net charge and shape lead to SDS-prot ein 
complexes h aving electrophoreti cmobilities in polyacrylamide gels tnat ate 
generally jd frectly proportional to th eTogaruhm of thelengflnrf^ 
pe ptide chain (Figure 1-5). By comparing the hioblllty 6T an unknown 
protein with that of a set of standard marker proteins, the molecular weight 
of the unknown polypeptide chain may usually be determined within 10 
per cent of the true value. In certain cases, abnormalities in SDS binding 
or protein conformation, large differences in intrinsic protein charge, or 
covalendy attached nonprotein moieties may lead to increased or de- 
creased electrophoretic mobilities; therefore, caution is advisable in use of 
this technique. Nevertheless, the remarkable resolution and ease of S DS 
electrophoresi sj^s^ wide ly usedjrne thod for^r fetenpjnpig 

tlie^tooTuTrvalue s of polypep tide cha in sizes. I t is especially useful in 
stucfies of processeTrn which the molecular weights of proteins are altered. 



Molecular weight estimation of polypeptide chains by electrophoresis in SDS- 
polyacrylamide gels. A. L. Shapiro, et al. Biochem. Biophys. Res. Commun. 
28:815-820, 1967. 
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Front- The background photograph of the cover is of a Laue x-ray diffraction 
pattern produced by a crystal of the plant enzyme ribulose bisphosphate 
carboxylase. This technique is described in Chapter 17. Information derived 
from such x-ray patterns, together with a knowledge of the ammo acid 
sequence enabled the three-dimensional arrangement of atoms in the protein 
to be determined. A simplified representation of this protein structure is shown 
in color, superimposed on the diffraction pattern. The enzyme, which is 
involved in the fixation of carbon dioxide, is a member of the large class of 
a/p barrel protein structures. This class of structures is discussed in detail in 
Chapter 4. 

Back- Tomato bushy stunt virus is a spherical virus made from 180 protein 
subunits. Arms extending from sixty of these subunits contribute to an internal 
framework that determines the size of the correctly assembled virus particle. The 
interdigitated arms from three subunits meet at each of the twenty icosahedral 
threefold axes of the virus. One such axis is shown herewith the p strands from 
three subunits shown in different shades of green. Virus structure is described 
in more detail in Chapter 11. 
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Figure 13.4 X-ray diffraction pattern from 
crystals of a membrane-bound protein, the 
bacterial photosynthetic reaction center. 
(Courtesy of Hartmut Michel.) 




many displayed the x-ray diagram shown in Figure 13.4. Not only was this the 
first x-ray picture to high resolution of a membrane protein, but the crystal was 
formed not from a small protein of trivial function but from a large complex of 
polypeptide chains that represents a class of proteins having a function of 
central importance for life on earth. The protein complex was a photosynthetic 
reaction center from the photosynthetic bacterium Rhodopseudomonas vmdis, 
which converts the energy of captured sunlight into electrical and chemical 
energy in the first steps of photosynthesis. The structure has subsequently been 
solved to 2.5 A resolution by H. Michel in collaboration with Hans Deisenhofer 
and Robert Huber at the same institute. 

The interiors of Rhodopseudomonad bacteria are filled with photosynthetic 
vesicles, which are hollow membrane-enveloped spheres. The photosynthetic 
reaction centers are embedded in the membrane of these vesicles. One end of 
the protein complex faces the inside of the vesicles, which is known as the 
periplasmic side, the other end faces the cytoplasm of the cell. Around each 
reaction center, on the periplasmic side within the vesicles, there are hundreds 
of small membrane proteins, the antenna pigment protein molecules with 
bound chlorophyll. These catch photons over a wide area and funnel them to 
the reaction center. By this arrangement the reaction center can utilize about a 
hundred times more photons than those that directly strike the special pair of 
chlorophyll molecules at the heart of the reaction center. 

The reaction center is built up from four polypeptide chains, three of which 
are called L, M, and H because they have light, medium, and heavy molecular 
masses as deduced from their electrophoretic mobility on SDS-PAGE. Subse- 
quent amino acid sequence determinations showed, however, that the H chain 
is in fact the smallest with 258 amino acids, followed by the L chain with 273 
amino acids. The M chain is the largest polypeptide with 323 amino acids. This 
discrepancy between apparent relative masses and real molecular weights 
underlines the uncertainty in deducing molecular masses of membrane-bound 
proteins from their mobility in electrophoretic gels. 

The L and M subunits show sequence identity of about 25% and are 
therefore homologous and evolutionarily related proteins. The H subunit, on 
the other hand, has a completely different sequence. The fourth subunit of the 
reaction center is a cytochrome that has 336 amino acids with a sequence that 
is not homologous to any other known cytochrome sequence. 

In addition to these polypeptide chains, the reaction center contains a 
number of pigments. There are four bacteriochlorophyll molecules (Figure 
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Proteolytic processing of the dengue virus polyprotein is mediated by host ceil proteases and the vims- 
encoded NS2B-NS3 two-component protease. The NS3 protease represents an attractive target for the devel- 
opment of antiviral inhibitors. The three-dimensional structure of the NS3 protease domain has been deter- 
mined, but the structural determinants necessary for activation of the enzyme by the NS2B cofactor have been 
characterized only to a limited extent. To test a possible functional role of the recently proposed <J>x 3 <I> motif 
in NS3 protease activation, we targeted six residues within the NS2B cofactor by site-specific mutagenesis. 
Residues Trp62, Ser71, Leu75, Hc77, Thr78, and lle79 in NS2B were replaced with alanine, and in addition, an 
L75A/179A double mutant was generated. The effects of these mutations on the activity of the NS2B(H)-NS3pro 
protease were analyzed in vitro by sodium dodecyl sulfate-polyacrylamide gel electrophoresis analysis of 
autoproteolytic cleavage at the NS2B/NS3 site and by assay of the enzyme with the fluorogenic peptide 
substrate GRR-AMC. Compared to the wild ty pe, the L75A, I77A, and I79A mutants demonstrated inefficient 
autoproteolysis, whereas in the W62A and the L75A/I79A mutants self-cleavage appeared to be almost com- 
pletely abolished. With exception of the S71A mutant, which had a k ca JK m value for the GRR-AMC peptide 
similar to that of the wild type, all other mutants exhibited drastically reduced k cal values. These results 
indicate a pivotal function of conserved residues Trp62, Leu75, and Ile79 in the NS2B cofactor in the structural 
activation of the dengue virus NS3 serine protease. 



Infection by dengue viruses is now widely recognized as a 
major public health concern, with more than 1 million cases of 
dengue hemorrhagic fever per year and case fatality rates rang- 
ing from 1 to 10% (23). There are four serotypes of dengue 
virus, which cause dengue hemorrhagic fever and dengue 
shock syndrome (21, 22). Dengue viruses, members of the 
Flaviviridae family, are small, enveloped, positive-stranded 
RNA viruses which are transmitted by Aedes mosquitoes (7). 
At present, neither a commercial vaccine nor a causative treat- 
ment is available for the prevention or cure of acute dengue 
virus diseases. 

The genomic RNA of dengue virus serotype 2 contains 
10,723 nucleotides and encodes a large polyprotein precursor 
of 3,391 amino acid residues which consists of three structural 
proteins (C, prM, and E) and seven nonstructural proteins 
(NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5) (26). The 
polyprotein is co- and posttranslationally processed by pro- 
teases of the host cell and the virus-encoded two-component 
protease NS2B-NS3 to generate individual viral proteins (11, 
18). Optimal activity of the NS3 serine protease (flavivirin, EC 
.3.4.21.91) is an essential requirement for maturation of the 
virus, and inhibition of this enzyme offers the prospect of an 
effective antiviral chemotherapy for severe cases of dengue 
hemorrhagic fever and dengue shock syndrome (for review see 
references 6, 39, and 41 and references herein). 
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The NS2B-NS3 two-component protease mediates cleavage 
in the nonstructural region of the viral polyprotein at the 
NS2A/NS2B, NS2B/NS3, NS3/NS4A, and NS4B/NS5 junc- 
tions. Additional cleavages within the C, NS2A, and NS4A 
proteins and within a C-terminal portion of NS3 itself were 
described in the literature (1, 34, 35, 46). With the exception of 
the NS2B/NS3 junction, which contains a glutamine residue at 
the P2 position, the cleavage sites for the NS3 protease, consist 
of a pair of dibasic amino acids (Arg or Lys) at the PI and P2 
positions, followed by a short-chain amino acid (Gly, Ala, or 
Ser) at the PI' site (12, 13, 40, 48, 51). The minimum domain 
size required for protease activity of the 69-kDa NS3 protein 
has been mapped to 167 residues at the N terminus (33). Based 
on sequence comparisons with known serine proteases, a cat- 
alytic triad comprised of residues His51, Asp75, and Serl35 
was identified, and replacement of the catalytic Serl35 residue 
by alanine resulted in an enzymatically inactive NS3 protease 
(47). The C-terminal two-thirds of the dengue virus NS3 pro- 
tein are associated with the enzymatic functions of a nucleo- 
side triphosphatase and RNA helicase (20, 27). The three- 
dimensional structure of the NS3 protease domain (NS3pro) 
encompassing the N-terminal 185 amino acids has been re- 
solved by X-ray crystallography, and the protein exhibits the 
six-stranded p-barrel conformation typical of chymotrypsin- 
like serine proteases (31). 

The presence of a small activating protein or cofactor is a 
prerequisite for optimal catalytic activity of the flaviviral pro- 
teases with natural polyprotein substrates (4, 13). Although the 
dengue virus NS3 protease exhibits NS2B-independent activity 
with model substrates for serine proteases such as Af-a-benzo- 
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yl-L-arginine-/?-nitroanilide, enzymatic cleavage of dibasic pep- 
tides is markedly enhanced with the NS2B-NS3 cocomplex, 
and the presence of the NS2B cofactor was shown to be an 
absolute requirement for trans cleavage of a cloned polypro- 
tein substrate (50). Intramolecular cleavage at the NS2B/NS3 
site conducive to the formation of a noncovalent complex was 
observed with the NS2B(H)-NS3pro molecule after purifica- 
tion from overexpressing Escherichia coli and subsequent re- 
folding (50). 

A genetically engineered NS2B(H)-NS3pro protease con- 
taining a noncleavable nonamer glycine linker between the 
NS2B activation sequence and the protease moiety exhibited 
higher specific activity with para-nitroanilide peptide sub- 
strates than the NS2B(H)-NS3pro molecule (32). Recently we 
have shown that the NS2B-NS3pro protease incorporating a 
full-length NS2B cofactor sequence could catalyze the cleavage 
of 12-mer peptide substrates representing native polyprotein 
junctions (28, 29). However, this protein appeared to be com- 
pletely resistant to proteolytic self-cleavage. 

The initial characterization of the cofactor requirement for 
the dengue virus NS3 protease revealed that the minimal re- 
gion required for protease activity was located in a 40-residue 
central hydrophilic segment of NS2B spanning residues Leu54 
to Glu93 (19). Mutagenesis experiments with the yellow fever 
virus NS2B protein demonstrated that specific residues within 
this core sequence are critical for protease activation (14). 
Deletion of residues 51 to 55, 53 to 55, and 56 to 93 within the 
conserved central domain yielded no detectable processing of 
an NS2B-NS3pro polyprotein precursor, whereas a four-ami- 
no-acid deletion of the sequence 67 ISGS 70 generated a pro- 
tease with significantly reduced cleavage efficiency. Directed 
mutagenesis within the yellow fever virus NS2B protein con- 
firmed a structural role for the N-terminal region of the con- 
served cofactor segment (17), Mutations within a charged N- 
terminal cluster comprising residues 52 ELKK 55 impaired cis 
cleavage activity at the NS2B/NS3 site, and deletion analysis 
revealed that the conserved domain alone provided only basal 
cofactor activity, while the optimal function of the cofactor 
required both hydrophobic flanking regions of NS2B. 

A significant reduction of NS3 cleavage activity was ob- 
served for the alanine substitutions at residues Val95 and 
Gln96 within the dengue virus NS3 protease sequence (37). It 
was proposed that these two residues are located at the C 
terminus of the NS2B binding cleft and that they are involved 
in precleavage association of NS2B with NS3 and proper pro- 
cessing at the NS2B/NS3 site. 

An essential requirement for the activation of the protease is 
the presence of hydrophobic residues in the cofactor, which 
may act as an anchor in the enzyme-cofactor complex. Within 
the hepatitis C virus NS4A cofactor, two residues, Ile25 and 
Ile29, are critical for complete activation of the NS3 protease 
(9, 44, 45). For the NS4A cofactor from GB virus, a minimum 
region which supports NS3 protease activity was mapped to a 
sequence spanning residues Phe22 to Val36, and two central 
residues, Val27 and Trp31, were indispensable for maximal 
proteolytic activity (10). 

A peptide comprising residues Ser69 to Glu81 of the dengue 
virus NS2B cofactor was recently proposed as a substitute for 
the cofactor (8), however, it failed to reconstitute catalytic 
activity of the NS3pro protease in vitro (32). Therefore, it 
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seems likely that additional residues located further at the N 
terminus of the NS2B core sequence play a role in NS3 acti- 
vation. 

These findings were suggestive of a common structural motif 
involved in activation of the flaviviral proteases. The <J>x 3 4> 
motif is comprised of two bulky hydrophobic residues sepa- 
rated by three unspecified residues, and it was speculated that 
additional residues located outside this sequence motif would 
contribute to the stringent specificity of the protease for the 
corresponding polyprotein substrate (10). 

Substantial biochemical data were accumulated for the hep- 
atitis C virus protease which offer some structural and mech- 
anistic explanations for the activation of this flaviviral protease 
by its cofactor. Interaction of the NS3 protease with the NS4A 
cofactor was shown to affect the folding of the NS3 protease, 
resulting in conformational rearrangements of the N-terminal 
28 residues of the protease and a strand displacement condu- 
cive to the formation of a well-ordered array of three p-sheets 
in which the cofactor becomes an integral part of the protease 
fold (30, 49). The result of these conformational changes is a 
reorientation of the residues of the catalytic triad which is 
more favorable for proton shuttling during catalysis. A number 
of studies provided evidence that structural rearrangements 
leading to a fully activated protease are induced not only by 
binding of the cofactor but also by the substrate, as shown for 
the competitive inhibition of the NS3 protease by its cleavage 
products (2, 3, 24, 25, 38). Based on these findings, the hepa- 
titis C virus enzyme has been described as an induced-fit pro- 
tease (6, 39). However, analogies to the hepatitis C virus sys- 
tem should be treated with caution, since preliminary data 
obtained with the dengue virus and the GB virus enzymes are 
indicative of major structural differences in the activation pro- 
cess (10, 50). 

We demonstrate in this report that alanine substitutions at 
residues Trp62, Leu 75, and Ile79 in the dengue virus NS2B 
cofactor result in marked effects on autoprocessing at the 
NS2B/NS3 site and that activity of the mutant NS3 proteases 
with the synthetic peptide substrate is mainly affected by sig- 
nificantly reduced k cat values. To analyze the structure- activity 
relationships which we have observed experimentally, we gen- 
erated a molecular model for the NS2B/NS3 cocomplex based 
on homology to the hepatitis C virus NS3/NS4A protease. 

MATERIALS AND METHODS 

Const niction of pTH/NS2B(H)-NS3pro by SOE-PCR. The recombinant plas- 
mid encoding the dengue virus serotype 2 NS2B(H)-NS3pro protein was gener- 
ated with the previously described plasmid pTH/NS2B-NS3 as a template for 
splicing by overlap extension (SOE)-PCR (15). The sequence for NS2B(H) was 
obtained with primers S'-TGCTCACTGGAGGATCCGCCGATTTGGAACT 
GGAG-3' (nucleotides 4259 to 4293 in dengue virus serotype 2) and 5'-CTTC 
ACnTCCCACAGGTACCACAGTGTTTGTTCTTCCTC-3 ' (nucleotides 4399 
to 4416). For amplification of NS3pro, primers 5 '-GAACAAACACTGTGGTA 
CCTGTGGG AAGTGAAG AAAC-3 ' (nucleotides 4492 to 4516) and 5'-CTTC 
TCTTTC A G G ATCCCT AATCTTCG ATCTCTG GGTTG-3 ' (nucleotides 5043 
to 5081) were used. 

SOE-PCR was performed with a combination of both templates with the 
NS2B(H) forward and the NS3pro reverse primers incorporating an overlapping 
region of 33 nucleotides. The product of SOE-PCR comprises the sequence of 
NS2B from amino acid residues 48 to 95 followed by residues 121 to 130 and 180 
N-terminal residues of the NS3 protease domain. The PCR product was cut with 
BamHl and cloned into the pTrcHisA expression vector (Invitrogen) to yield the 
polyhistidine-tagged fusion protein. The sequence of the resulting construct was 
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verified by DNA sequencing with an ABI Prism model 377 sequencer with a dye 
terminator cycle sequencing reaction kit (Perkin Elmer). 

Mutation constructs. Alanine substitutions were introduced in the NS2B se- 
quence at residues Trp62, Ser71, Leu75, Ile77, Thr7S, and lle79 with the 
OuikChange sile-direcied mutagenesis kit (Stratagene) following the manufac- 
turer's direclions. In addition, a double Leu75/IIe79 mutant was generated with 
the L75A mutant as the template for PCR with He79 mutagenic oligonucleotides. 
Additional base changes creating restriction sites suitable for screening of the 
resulting mutant constructs were introduced in the primer sequences. 

The following pairs of forward and reverse primers were used for mutagenesis 
(bold letters indicate changed nucleotides, and italic letters represent restriction 
sites): W62A, 5'-CGATGTCAA/lGCTGAAGACCAGGCAGAG-3' and 5'-CT 
CTG CCTG GTCTTG4 C C71TG A CATCG - 3 ' ; S71A, 5' GAGATATCAGG^G 
C7AGTCCAATCC-3' and 5'-GGATTGGACT4CCrCCTGATATCTC-3'; 
L75A, 5 '-GCAGTCCG/1 TCGCGTCAATAACAATATCAG-3 ' and 5'-CTGAT 
ATTGTTATTGACGCG/17CGGACTGC-3' ; I77A, 5'CCAATCCTGTOfGC 
7A C AATCTC A G AA G ATG G -3 ' and 5 ' - CC ATCTTCTG AT ATTGT A G C 7G A 
CAGGATTGG-3' ; T78A, 5'-CCAATCCTGTCAATAGCAATCTG4G/4AGA 
TGG-3' and 5 ' -GC ATCTTCTG A G>4 TTG CT ATTG A C A G G ATTG G - 3 ' ; and 
I79A, 5 '-TCAATAACAGCCTC/1G AAG ATGGTAGC-3 ' and 5'-GCTACCAT 
CVTCTGA GGCTGTTATTGAC-3V 

The catalytically inactive NS3 protease mutant with an S135A substitution was 
obtained as described earlier (28). Plasmid DNA from the mutants was analyzed 
by DNA sequencing to confirm that only the desired mutation was incorporated. 

Expression and purification of protease constructs. The pTrcHis plasmids 
containing the recombinant NS2B(H)-NS3pro sequences were transformed into 
Escherichia colt C4](DE3). Transformants were grown in Luria broth (LB) 
medium supplemented with ampicillin (100 u.g/ml) at 37°C. At anA^o of 0.5, 
isopropyl-1-thio-p-D-galactopyranoside was added to 0.1 mM, and the culture 
was grown at 37°C for 8 h. Cells were harvested by centrifugalion (5,000 x g, 10 
min, 4°C), resuspended in 20 ml of lysis buffer A (100 mM Tris-HCl, pH 7.5, 300 
mM NaCl), and lysed with a French pressure cell at 14.000 lb/in 2 . The lysate was 
clarified by centrifugation (10,000 x g, 30 min, 4°C), and the pellet fraction was 
washed two times with lysis buffer containing 1% Triton X-100. 

Inclusion bodies were suspended in 15 ml of denaturing buffer B (100 mM 
Tris-HCl, pH 8.0, 300 mM NaCl, 8 M urea) followed by sonication (10 bursts at 
power setting 3 for 15 s) with a ultrasonic processor (Misonlx). The suspension 
was centrifuged (10,000 x g, 30 min, 4°C), and the supernatant was loaded on a 
Hitrap chelating column (Pharmacia) equilibrated with denaturing buffer. The 
column was washed with 30 column volumes of denaturing buffer containing 20 
mM imidazole and eluted at a flow rate of 0.5 ml min"' with denaturing buffer 
containing 50 mM imidazole. Fractions of 1 ml were collected, and aliquots were 
analyzed for the presence of NS2B(H)-NS3pro by sodium dodecyl sulfate (SDS)- 
polyacrylamide gel electrophoresis (PAGE) on 15% poly acryl amide gels. 

Peak fractions were pooled and loaded on a Superdex 200 HR 10/30 gel 
filtration column (Pharmacia). The column was eluted with denaturing buffer at 
a flow rate of 0.3 ml min"* and the fractions containing NS2B(H)-NS3pro, as 
analyzed by SDS-PAGE, were pooled and diluted with the same buffer to 0.5 mg 
ml" 1 . Refolding of the protein was initiated by stepwise dialysis of l-ml samples 
with a dialysis tubing (cutoff, 8 kDa) at 4°C against three changes of 100 mM 
Tris-HCl, pH 8.0-300 mM NaCl (200 ml) and one change against 200 ml of 100 
mM Tris-HCl, pH 9.0-50 mM NaCl (buffer C). The dialysate was centrifuged 
(10,000 x g, 10 min, 4°C), and the protein concentration was determined with a 
Bradford protein assay kit (Bio-Rad). Preparations of the NS2B(H)-NS3pro 
protein were stored at -20°C in 100 mM Tris-HCl, pH 9.0-50 mM NaCl-50% 
glycerol. 

Determination of NS3 protease activity, Autocleavage of NS2B(H)-NS3pro at 
the NS2B/NS3 site was monitored by Tricine- SDS-PAGE (43). Samples contain- 
ing 0.5 p-g uJ _1 of purified NS2B(H)-NS3pro in buffer C were incubated at 37X, 
and aliquots of 20 u.1 were removed at fixed intervals. The reaction was quenched 
by the addition of 7 u.1 of SDS-PAGE sample buffer (200 mM Tris-HCl, pH 7.5, 
4% [wt/vol] SDS, 40% glycerol, 0.1% bromophenol blue, 100 mM dithiothreitol), 
and precursor processing was quantitated by densitometry analysis of band 
intensities obtained from SDS-PAGE with the ONE-D scan program (Scanalyt- 
ics). Cleavage at the NS2B/NS3 site was confirmed by automated Edman amino 
acid sequencing of the protease fragment NS3pro and by Western blot analysis 
with anti-Xpress antibodies (Invitrogen) of the N-terminal cleavage fragment 
(His) 6 NS2B(H). 

The fluorogenic substrate GRR-AMC (Peptide International) was used for 
the in vitro assay of NS3. The assay was performed in 96-well microtiter plates 
with a Labsystems Fluoroscan II (Labsystems) at an excitation wavelength of 355 
nm and emission wavelength of 460 nm at 37°C Assay reactions contained in a 
HXKp.1 final volume NS2B(H)-NS3pro and the mutant proteins at a concentra- 
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FIG. 1. Organization of the dengue virus polyprotein and the 
NS2B(H)-NS3(pro) construct. (A) Sites on the dengue virus polypro- 
tein cleaved by host-encoded proteases (V) and the virus-encoded 
two-component protease NS2B-NS3 (T). The NS3 protease domain 
(NS3pro) is shaded, and the 40-residue minimum cofactor region 
which supports catalytic activity of the NS3 protease is indicated by a 
bar. (B) Structural organization of the expression construct NS2B(H)- 
NS3pro as generated by SOE-PCR. The construct contains the N- 
tcrminal polyhistidine tag for purification purposes, the central activa- 
tion sequence of NS2B from residues 48 to 95 followed by residues 121 
to 130, and 180 N-terminal residues of NS3 representing the protease 
domain. The residues of the catalytic triad, His51, Asp75, and Serl35, 
are indicated, and the sequence of the native NS2B/NS3 polyprotein 
cleavage junction is shown. 



lion of 0.15 \iM in 100 mM Tris-HCl, pH 9.0. The substrate concentration was 
varied between 10 and 250 jjlM, and signals were converted to concentrations by 
comparison with standard amounts of free AMC. K m and K max values were 
obtained from measurements of initial velocities prior to 10% substrate deple- 
tion, assuming that 100% of the protease was enzymatically active. To determine 
K,„ and K max values, Michaelis-Menten kinetics, v - K max [S]/(S] + K m , were 
transformed into double reciprocal Lineweaver-Burk plots by nonlinear regres- 
sion analysis with the GraphPad Prism software. Three independent experiments 
were carried out for each set of data points, and data are reported as mean ± 
standard error. 

Coordinates and molecular modeling. The crystal structures of the dengue 
virus type 2 NS3 serine protease (protein database identifier 1BEF) (31) and the 
hepatitis C virus NS3/NS4A complex (protein database identifier 1JXP) (49) 
were obtained from the protein database (Brookhaven National Laboratory). To 
obtain a structure for the dengue virus NS2B core segment in the complex with 
the NS3 protease, the NS2B peptide (residues 56 to 93) was initially aligned with 
the NS4A peptide (residues 21 to 32) according to the published sequence 
comparison (10). The structure for the NS2B peptide was generated with the 
Modeller 6v2 software (42). The dengue virus serotype 2 protease domain was 
superimposed on the hepatitis C virus protease domain, and the structures of the 
NS3 protease and the NS2B peptide were combined. Several 1-ns molecular 
dynamics trajectories of NS3pro in complex with NS2B were generated. The 
simulation of the resulting structure was performed with the Gromacs simulation 
package (http://www.grom acs.org) with GROMOS96 force field in a single-point 
charge water box. The space between protein and box walls was set to a minimum 
distance of 7 A, and the system was energy minimized with the steepest descent. 
The simulated structures were visualized by Deepview Swiss-Pdb Viewer v3.5b4. 
The final model was evaluated by a Ramachandran plot, and 98.2% of the 
nonglycine residues were in allowed conformations. 



RESULTS 

Generation of mutants. The sequence 75 LSITI 79 represents 
the <J>x 3 <J> motif in the NS2B cofactor of dengue virus serotype 
2 which was recently proposed to play a functional role in the 
association of the flaviviral proteases with their corresponding 
cofactors (10). To analyze the effects of amino acid substitu- 
tions in the NS2B cofactor on the enzymatic activity of the NS3 
serine protease, we constructed the NS2B(H)-NS3pro polypro- 
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FIG. 2. Sequence alignment of the conserved central domain 
within the NS2B cofactors of members of the Flaviviridae family. Num- 
bers on the left indicate positions of amino acid residues in the 
polyprotein sequence. Residues which are identical in all sequences 
are shaded in dark grey, and conserved residues are shaded in light 
grey. The location of the <J>x 3 C> motif is indicated above the alignment, 
and residues within the dengue virus NS2B cofactor that were changed 
to alanine are labeled by dots. Abbreviations: DEN, dengue virus; 
JEV, Japanese encephalitis virus; KUNJ1N, Kunjin virus; MVE, Mur- 
ray valley encephalitis virus; TBE, tick-borne encephalitis vims; WNV, 
West Nile virus; YFV, yellow fever virus. 



tein precursor by SOE-PCR (Fig. 1) (15). A sequence align- 
ment of known flavivirus cofactor sequences and the location 
of the <J>x 3 <J> motif are shown in Fig. 2. 

Previous work had shown that the NS2B(H)-NS3pro protein 
undergoes proteolytic self-cleavage at the NS2B/NS3 site that 
is conducive to the formation of a noncovalent complex (50). 
Alanine substitutions were introduced at residues Trp62, 
Ser71, Leu 75, Ile77, Thr78, Ile79, and Leu75 plus Ile79 in the 
NS2B sequence. An enzymatically inactive NS2B(H)-NS3pro 
protein was obtained by replacing the catalytic Serl35 residue 
with alanine and used as negative control for the activity assays 
as described previously (28). All recombinant plasmids were 
subjected to DNA sequencing, and no mutations were found at 
nontargeted sites. Expression of the mutant derivatives as in- 
clusion bodies in E. coli and purification under denaturing 
conditions by immobilized metal chelate chromatography and 
gel filtration yielded homogeneous products, as determined by 
SDS-PAGE analysis. The 29.8-kDa (His) 6 -NS2B(H)-NS3pro 
molecule displayed anomalous migration in gel electrophoresis 
with a higher apparent molecular mass of approximately 37 
kDa. Subsequent refolding was performed by stepwise dialysis, 
and correct cleavage at the NS2B/NS3 junction was confirmed 
for the wild-type protein by N-terminal amino acid sequencing 
of the 20-kDa cleavage product, which yielded the sequence 
AGVLW, identical to the first five residues of the NS3 protein. 

Effect of alanine substitutions on self-cleavage efficiency. 
Protein samples purified by metal affinity chromatography and 
gel filtration were analyzed by SDS-PAGE for autocleavage at 
the NS2B/NS3 site after various periods of incubation (Fig. 3). 
After extensive dialysis, wild-type NS2B(H)-NS3pro exhibited 
complete autoproteolytic cleavage, which resulted in two pro- 
tein products of 20 kDa and 10 kDa, whereas the S135A 
mutant was completely inactive in the self-cleavage assay. In 
Western blot analysis, only the 10-kDa protein reacted with 
anti-Xpress antibodies directed against the polyhistidine tag, 
which confirmed that this protein represents the N-terminal 
cleavage fragment (His) 6 -NS2B(H) (data not shown). 

The NS2B mutants displayed different levels of proteolytic 
processing when analyzed by SDS-PAGE immediately after 



refolding. A densitometry analysis based on the amount of the 
NS2B(H)-NS3pro precursor remaining revealed that self- 
cleavage was almost completely abolished in the L75A/I79A 
double mutant. Autoprocessing was markedly reduced in the 
W62A mutant, which gave approximately 10% of wild-type 
activity. In contrast, the cleavage efficiencies of the S71A and 
the T78A mutants were not significantly affected by the sub- 
stitutions and were comparable to that of the wild type. A 
sequence alignment of known flaviviral cofactor sequences 
shows that a serine at position 71 is preferred in most viruses 
of the Flaviviridae family, and substitution of this residue with 
alanine did not have a marked effect on proteolytic self-cleav- 
age. 

Thr78 is part of the 3>x 3 <i> motif, but this residue is not very 
well conserved among the Flaviviridae and we expected that 
one alanine residue in the context of the serotype 2 sequence 
would have only a marginal effect on the activity of the NS3 
protease. In accordance with this prediction, at this position 
the presence of the hydrophobic alanine residue is well toler- 
ated. In contrast, alanine replacements at the hydrophobic 
residues Leu75, IIe77, and Ile79 caused a reduction in auto- 
cleavage activity of the NS2B(H)-NS3pro protein. Substitu- 
tions at Leu75 and Ile79 reduced autoprocessing to approxi- 
mately 55 and 75% of the wild-type value, and the L75A 
mutation had a greater effect on cleavage efficiency than the 
177A substitution, which still allowed for approximately 80% of 
wild-type precursor cleavage. 

The data presented here support an important function for 
the 3>x 3 <t» motif in activation of the NS3 protease. In agree- 
ment with a critical role for this hydrophobic sequence ele- 
ment, we found that self-cleavage was substantially decreased 
for the L75A/I79A double mutant, which gave only 2% of 
wild-type cleavage. 

We also examined the effect of an alanine replacement at 
the W62 residue. This position is strictly conserved among the 
members of the Flaviviridae family (Fig. 2) and is located in the 
N-termina! region of the NS2B activation sequence. Alanine 
substitution at this position had a dramatic effect on protease 
activity, and self-cleavage was markedly reduced with this mu- 
tant protein; a finding which suggests a pivotal function for this 
invariant residue in protease activation. 

Delayed processing kinetics of the mutants. To answer the 
question of whether the alanine substitutions at critical posi- 
tions within the NS2B cofactor resulted in a catalytically inef- 
ficient NS3 serine protease, we analyzed the levels of self- 
cleavage after various periods of incubation ranging from 1 to 
24 h (Fig. 3). 

The wild-type NS2B(H)-NS3pro molecule and the S71A 
mutant underwent complete autoproteolytic cleavage during 
the refolding process, whereas progressive proteolysis leading 
to complete cleavage of the precursor was observed for the 
I77A, T78A, and I79A mutants. Continued proteolysis of the 
S71A mutant protein resulted in additional cleavage fragments 
in SDS-PAGE at molecular masses of approximately 16 and 12 
kDa. These proteins could represent fragments generated by 
cleavage at internal sequences within the NS3pro molecule. 
The NS3pro sequence contains paired basic residues at posi- 
tions 63 KRI 65 and 142 KKG M4 and a monobasic site resembling 
the NS2B/NS3 junction at 27 QRG 29 which could serve as ad- 
ditional substrates for the protease, and cleavage at these sites 
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FIG. 3. Kinetics of proteolytic autoprocessing of the NS2B(H)- 
NS3(pro) mutant derivatives. (A) Samples of wild-type NS2B(H)- 



TABLE 1. Kinetic parameters of the NS2B(H)-NS3pro 

mutant proteases^ 



Construct 


Activity 
(% of wild-type 
activity) 






(min J ) 


(M _1 s' 1 ) 


Wild type 


100 


146 ± 5.4 


1.2 


± 0.07 


137 ± 3.0 


SI 35 A 


ND 


ND 




ND 


ND 


W62A 


ND 


ND 




ND 


ND 


S71A 


118 


133 ± 7.0 


1.3 


± 0.03 


163 ± 10.0 


L75A 


3.4 


180 ± 4.9 


0.05 


± 0.007 


4.7 ± 0.5 


I77A 


8.4 


171 ± 15.3 


0.12 


± 0.01 


11.5 ± 0.13 


T78A 


65 


225 ±5.4 


1.2 


± 0.09 


89.0 ± 9.0 


1 79 A 


1.9 


181 ± 8.0 


0.03 


± 0.004 


2.6 ± 0.2 


L75A/I79A 


ND 


ND 




ND 


ND 



* Protease activity was assayed in 0.1 M Tris-HCl (pH 9.0) for 60 min at 37°C 
with the fluorogenic peptide GRR-AMC at concentrations ranging from 10 to 
250 |uiM. Standard reactions contained protease at a concentration of 0.15 p,M. 
No activity was observed with the W62A mutant and the L75A/179A double 
mutant at a 1.5 jxM enzyme concentration. The activity of the wild-type enzyme 
NS2B(H)-NS3pro with GRR-AMC was taken as 100%. ND, not delectable. 



would generate products of the observed sizes. Whether the 
additional cleavage products are formed by internal cleavage at 
these sites remains to be investigated. For the L75A mutant, 
approximately 75% cleaved precursor was observed at 24 h of 
incubation. The L75A/I79A double mutant showed weak 
cleavage activity and yielded only 50% precursor cleavage after 
24 h. The W62A mutant did not display a significant increase 
in the amount of autoprocessing products at 24 h. Therefore, it 
is likely that the W62A substitution within the NS2B cofactor 
results in a significant inactivation of the NS2B-NS3 protease. 

Reactivity with small substrate peptides. The NS3 protease 
reacts with small model substrates for serine proteases in the 
absence of the NS2B cofactor, but cleavage efficiency of the 
protease towards synthetic tripeptide substrates is significantly 
stimulated in the presence of the 40-residue NS2B activation 
sequence (50). Cleavage at the NS2B/NS3 site is not a prereq- 
uisite for the reaction with small substrates, as shown by the 
activity of the NS2B-NS3pro precursor with 12-mer peptides 
(29). This protein did not display significant levels of auto- 
cleavage. 

Comparison of activities with the synthetic peptide GRR- 
AMC between wild-type NS2B(H)-NS3pro and the mutant 
derivatives of NS2B revealed that the alanine substitutions 
affected the rate of substrate hydrolysis. Data for the kinetic 
parameters are presented in Table 1. The alanine substitutions 
mainly affected the values of the recombinant proteases, 
whereas Michaelis-Menten equilibrium constants appeared to 
be less affected (Fig. 4). For the T78A mutant, only a 1.5-fold 
increase in K m was observed, whereas the largest changes in 
fc cat occurred in the L75A and 179A mutants, which had k cal 



NS3pro and the mutant proteins were refolded by successive dialysis, 
incubated at 37°C, and analyzed on Coomassie blue-stained Tricine- 
SDS-PAGE gels as described in the text. Lane M, protein molecular 
size markers. Numbers above the lanes indicate incubation times rang- 
ing from 0 to 24 h. (B) Band intensities of wild-type NS2B(H)-NS3pro 
and the mutant polypeptides as observed on Tricine-SDS-PAGE gels 
were quantitated by densitometric analysis with the ONE-D scan pro- 
gram, and the fraction of cleaved NS2B(H)-NS3(pro) precursor was 
plotted as a function of incubation time. 
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FIG. 4. Steady-state cleavage kinetics of the GRR-AMC substrate 
by NS2B(H)-NS3pro derivatives in vitro. Reactions were performed in 
triplicate at a 0.15 p.M protein concentration over a range of substrate 
concentrations from 10 to 250 u.M. Assays were performed with 60- 
min incubation periods, and kinetic parameters were obtained by non- 
linear regression analysis of initial velocities prior to 10% substrate 
depletion. The L75A/I79A double mutant and the W62A mutant had 
no measurable activity at a 10-fold-higher enzyme concentration (1.5 
p.M). Data were plotted in a double reciprocal formal. 



values that were 24- and 40-fold lower, respectively, than that 
of the wild type. In the case of the S71A mutant, the replace- 
ment resulted in a slightly more active enzyme, with a k c JK m 
value that was 1.2-fold higher than that of the wild type. Cat- 
alytic efficiencies expressed as k c JK m values were substantially 
reduced for the L75A, I77A, and the I79A mutants, which had 
30-, 12-, and 52-fold-lower efficiencies, respectively, compared 
to the wild type. The activity of the L75A/179A double mutant 
and the W62A mutant with GRR-AMC was negligible under 
the conditions of the assay, and a 10-fold increase in enzyme 
concentration did not result in detectable conversion of the 
substrate. 

It is unlikely that the differences in catalytic efficiency which 
we observed between the wild-type and mutant NS2B(H)- 
NS3pro proteins result simply from a distortion of the NS2B/ 
NS3 cleavage site, since the mutations seem to affect auto- 
cleavage and reactivity with small peptides as well. Since the 
gel electrophoresis assay does not allow detection of trans 
cleavage activity, we cannot exclude the possibility that some of 
the proteolysis products shown in Fig. 3 are generated by trans 
cleavage of the NS2B(H)-NS3pro precursor. Autoprocessing 
at the NS2B/NS3 site is not strictly required for trans cleavage 
activity, and for mutants displaying intermediate levels of au- 
toproteolysis, the unprocessed precursor NS2B(H)-NS3pro 
may also contribute to enzymatic conversion of the synthetic 
substrate (29). However, mutations in the NS2B/NS3 cleavage 
site sequence which abolish autoprocessing result in catalyti- 
cally poor NS3 proteases with inefficient trans cleavage of less 
than 10% of the wild-type activity (R. Khumthong, unpub- 
lished data). 

In summary, the alanine substitutions in the cofactor had 
greater effects on the reaction rates of the NS3 protease than 
on substrate binding. This would imply a model for the acti- 
vation of the NS3 protease in which the cofactor contributes 
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mainly to an arrangement of the residues of the catalytic triad 
that is optimized for proton transfer during substrate cleavage. 

DISCUSSION 

Previous studies have shown that the activity of the dengue 
virus NS3 serine protease depends critically on the presence of 
the small NS2B cofactor protein (32, 50). In this report, we 
further investigated the structural determinants for the inter- 
action of the NS3 protease with the NS2B cofactor by gener- 
ating alanine substitutions at selected positions possibly in- 
volved in association of the protease-cofactor complex. Effects 
on the enzymatic activity of the NS3 protease were determined 
by analysis of autoprocessing at the NS2B/NS3 site and by 
reaction with the synthetic peptide GRR-AMC. The enzymatic 
data which we obtained for both types of reactions indicate an 
extreme sensitivity of NS3 cleavage activity to the correct con- 
formation of the NS2B cofactor. 

First, we examined the functional relevance of the 4>x 3 4> 
motif for cofactor-induced activation of the NS3 protease. 
Based on structural and mutational evidence obtained for the 
GB virus and hepatitis C virus NS3 proteases, a consensus 
sequence element involved in flaviviral protease activation was 
recently identified (10). By comparison between the structures 
of the hepatitis C virus NS3 protease and the NS3/NS4A co- 
complex, it was hypothesized that the binding pocket for the 
first <t> residue (Leu75 in dengue virus type 2) undergoes a 
substantial conformational change, whereas the pocket for the 
second <I> residue (Ile79) remains largely unchanged upon 
complexation with the cofactor (30, 36). The second hydropho- 
bic amino acid was proposed to occupy the hydrophobic pocket 
between the two p-barrel subdomains and to contribute to 
stabilization of the relative orientation of these subdomains. 
Our data show that the alanine substitution at Ile79 had a 
greater effect on NS3 autocleavage activity than the substitu- 
tion at Leu75, whereas the catalytic efficiency was approxi- 
mately 1.8-fold lower in the I79A mutant. A drastic effect on 
protease activation was observed with the L75A/I79A double 
mutant, in which autocleavage was almost completely elimi- 
nated and enzymatic activity with the GRR-AMC peptide was 
not detectable under the conditions of our assay. 

In addition, we examined two noncritical residues, Ile77 and 
Thr78, which are located within the <Px 3(tJ motif, Although 
these two mutations had only marginal effects on autocleavage, 
the alanine substitution at Ile77 resulted in reduced enzyme 
activity, as demonstrated by a 12-fold-lower k c JK m value com- 
pared to the wild-type enzyme. The T78A substitution yielded 
a less active enzyme with a 1.5-fold higher K m value for the 
GRR-AMC substrate than the wild type. These results support 
a role for the ( t > x 3 ( t> motif in the activation mechanism of the 
dengue virus NS2B cofactor in which the unspecified residues 
also contribute to the interactions which are necessary for 
protease activation, a feature which discriminates the dengue 
virus protease from the recently analyzed GB virus NS3 pro- 
tease (10). It appears that the association between the cofactor 
and the protease is mainly directed by hydrophobic interac- 
lions and that the mutations introduced in this region had 
greater effects on the catalytic efficiency of the protease than 
on the K m values, suggesting that perturbation of the hydro- 
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phobic interactions in the <I>x3<J> motif may primarily affect the 
geometry of the catalytic triad. 

Intermediate effects on enzymatic activity, as observed for 
the I77A, I79A, and T78A mutants, may reflect a role of these 
residues not only in the conformational activation of the cat- 
alytic apparatus of the enzyme but also for the stabilization of 
a ternary substrate-cofactor-protease complex. For the hepa- 
titis C virus protease, a synergistic cooperation between the 
cofactor and the substrate was proposed to form an induced-fit 
protease with optimal catalytic activity and high specificity for 
the polyprotein substrate (5). Therefore, in the dengue virus 
NS2B/NS3 protease, additional residues located outside the 
$>x3$> motif may contribute to the structural rearrangements 
induced by cofactor binding. 

Substitution of Trp62 with alanine had the strongest effect 
on the activity of the NS3 protease. This residue is located 
outside the proposed short activation sequence which is ho- 
mologous to the hepatitis C virus NS4A peptide (8). A deletion 
construct lacking the Trp62 residue was previously shown to be 
inactive in cleavage assays, and the sequence comprising resi- 
dues 58 to 62 was implicated in the conformational stabiliza- 
tion of the protease (19). The finding that Trp62 is of para- 
mount importance for the activation mechanism also provides 
an explanation for the inability of the peptide Gly 69 -GIu 83 to 
replace the NS2B core sequence in vitro (32); however, it does 
not exclude the possibility that additional interactions at the N 
terminus of the NS2B(H) sequence are necessary for optimal 
activity, as was shown earlier for the yellow fever virus NS3 
protease (17). 

With the exception erf the Trp62 substitution, the mutations 
which we introduced in the NS2B sequence of dengue virus 
type 2 do not simply abolish the enzymatic activity of the NS3 
protease. Instead, we observed slow kinetics for autoprocessing 
at the NS2B/NS3 site upon prolonged incubation of the mutant 
enzymes. The kinetic analysis of GRR-AMC cleavage and the 
self-cleavage reaction indicates the existence of an inefficient 
catalytic machinery for both types of substrate conversion in 
the NS2B mutants. 

In the absence of a crystallographic structure for the dengue 
virus NS2B-NS3 complex, we generated a model based on 
homology with the hepatitis C virus NS4A peptide (49). In 
analogy to the structure of the hepatitis C virus NS3/NS4A 
complex, the model predicts a threading of the cofactor in an 
extended conformation on a large and mainly hydrophobic 
surface groove of the NS3 protease formed by the N- and 
C-terrnina! domains (Fig. 5). Both hydrophobic residues Leu75 
and Ile79 of the <E>x 3 <J> motif occupy a hydrophobic pocket at 
the domain interface. The interactions observed in this model 
also predict a crucial role for the Gln96 residue in the NS3 
sequence by formation of a hydrogen bond to Ile79 in the 
NS2B main chain, an observation which would be consistent 
with the weak protease activity of a recently described V95A/ 
Q96A NS3 mutant (37). 

In the model presented here, the critical residue Trp62 is 
located in close proximity to an N-terminal cluster of proline 
residues, ProlO, -11, and -12, in the NS3 sequence, and the 
NS2B peptide is attached to a surface-exposed structure of the 
NS3 protease. This interaction, which is reminiscent of the 
N-termina! clamping observed with the hepatitis C virus NS4A . 
peptide, may govern the correct association of the cofactor 
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FIG. 5. Molecular model of the interaction between the NS2B core 
segment and the NS3 protease. The model was generated by Deepview 
Swiss- Pdbviewer. (A) NS2B(H) (residues 56 to 93) mapped onto the 
X-ray crystal structure of NS3. The surface of NS3pro is shown in blue, 
and NS2B(H) is shown as a ribbon in yellow. The location of Trp62 at 
the N terminus of NS2B(H) is shown. (B) Interaction of residues 
Leu75, Ile77, and lle79 with the hydrophobic pocket of NS3. The 
surface of NS3 is colored by the residue type (yellow, polar; blue, basic; 
red, acidic; white, nonpolar); the NS2B segment is shown as a yellow 
stick with the side chains of Leu75, He77, and Ile79 in grey. (C) 
Association of the Trp62 residue (red) of NS2B(H) (yellow ribbon) 
with the N-terminal proline cluster of NS3 (side chains in green and 
NS3 in blue). 
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with the NS3 surface (49). Alternatively, it is conceivable that 
the binding of the cofactor contributes to a structural organi- 
zation of the N-terminal region of the NS3 protein, since the 
N-terminal residues of NS3 display a high degree of confor- 
mational flexibility (31). For the hepatitis C virus NS3 pro- 
tease, folding of the N-terminal 28 residues into a p-strand and 
an a-helix was observed as a result of NS4A binding (49). 

Further predictions based on this model, especially changes 
in the geometry of the active site associated with NS2B bind- 
ing, would be too speculative at the moment. Elucidation of 
the precise mechanism of cofactor-dependent activation of the 
dengue virus NS3 protease has to await the resolution of the 
three-dimensional structure of the NS2B-NS3 cocomplex. 

The cofactor-induced activation process is comparatively 
well characterized for the hepatitis C virus NS3 protease by a 
number of structural and spectroscopic studies. Nuclear mag- 
netic resonance experiments revealed large nonlocal structural 
changes leading to a catalytic triad which is better ordered in 
the presence of NS4A (38). Solution structures obtained with 
a covalently bound ketoacid inhibitor disclosed a hitherto un- 
recognized role for the substrate in stabilization of the catalytic 
machinery by the formation of hydrogen bonds within the S' 
subsite of the enzyme (2). According to this model, complex- 
ation of the protease with the NS4A cofactor leads to indirect 
activation of NS3 and induces a conformation which is preor- 
ganized for substrate binding (5). It remains to be investigated 
whether similar enzyme-substrate interactions and induced-fit 
mechanisms contribute to active-site stabilization of the den- 
gue virus NS2B-NS3 protease. 

Taken together, our results support a model for the activa- 
tion of the dengue virus protease by the NS2B cofactor which 
depends critically on the presence of specific residues in the 
cofactor core sequence rather than on the overall conforma- 
tion. The residues located within the structural <Px 3 <I> motif 
play an important role in this activation process, an observa- 
tion which confirms earlier findings for related flaviviral en- 
zymes. In addition, we have shown that a single residue, Trp62, 
located in the N-terminal region of the NS2B core segment is 
of high relevance for conformational activation. The structural 
reasons for this unusual requirement are not entirely clear at 
the moment, and further studies are required to investigate the 
complex structure-activity relationships of the dengue virus 
two-component protease. These investigations are not only 
useful for the understanding of the cofactor-induced activation 
of proteolytic enzymes, but may also facilitate the development 
of inhibitors which interfere with the formation of viral pro- 
tease complexes. 
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Abstract 

Using ELISA we provide direct evidence that the 
midgut defensins of the blood-sucking fly Stomoxys 
calcitrans are secreted into the gut lumen. We show 
that midgut defensin peptide levels increase up to 
fortyfold in response to a blood meal but not to a sugar 
meal. The data suggests the midgut defensin genes 
are posMranscriptionally regulated and that their 
function is protection of the stored blood meal from 
bacterial attack while it awaits digestion. Using 
recombinant defensins produced in Pichia pastoris 
we demonstrate that while in the gut cells the midgut 
defensins are bound in an SDS-stable complex to 
proteins with an apparent molecular weight of 
> 26 kDa from which they are released when secreted 
into the gut lumen. This > 26 kDa protein (Ssp3) has 
been cloned and sequenced and is a member of 
the serine protease S1 family with homologies to 
multiple insect proteases and to vertebrate trypsins 
and elastases. 

Keywords: defensin, serine protease, immunity, blood- 
sucking, midgut. 

Introduction 

The major immune responses of insects, or at least the 
most studied, are mounted from the insect fat body. 
Challenge to the insect leads to transcriptional up-regulation 
of a series of genes and the de novo synthesis and secre- 
tion of a battery of immune peptides from the fat body 
(Hoffmann & Reichhart, 1997). These insect immune 
responses are non-clonal and have many similarities to the 
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innate immune system of vertebrates (Medzhitov & Janeway, 
1997). Insects also have well developed epithelial immune 
responses (Brey et al. t 1993; Lehane etaL, 1997; Tzou 
etaL, 2000). In Drosophila all of the epithelial surfaces of 
the insect body produce antimicrobial peptides usually as 
a subset of the total number of antimicrobial peptides in the 
insect; the subset is usually complementary containing 
both anti-Gram positive and anti-Gram negative activities 
(Tzou et ai, 2000). Among these epithelia only Drosophila 
midgut expresses all of the antimicrobial peptides (Tzou 
etaL, 2000) and this reflects the relative vulnerability of the 
gut to infection, which is common to all metazoa. Given that 
the midgut is the primary site of entry for the most important 
insect borne parasites including malaria, trypanosomes 
and arbovirus for example, it is surprising how little we know 
about insect midgut immunity, particularly in blood-sucking 
insects. We do know that agglutinins play an antiparasitic 
role in the midgut of some insects (Maudlin, 1991), that 
peritrophic matrix has a defensive function (Lehane, 1997), 
that the anti-Gram positive enzyme lysozyme may be 
present and that Plasmodium ookinetes may, on occasion, 
be lysed by undefined mechanisms in the midgut (Vernick 
etaL, 1995). 

In blood-sucking insects secretion of antimicrobial 
peptides in the midgut epithelium has been demonstrated 
in Stomoxys calcitrans (Lehane etaL, 1997), Anopheles 
gambiae (Dimopoulos et a/., 1997) and Glossina morsitans 
morsitans (Hao etaL, 2001). The two midgut specific 
defensins reported from the anterior midgut (reservoir) of 
the stable fly S. calcitrans were particularly unusual in 
being specific to that tissue and being constitutively 
produced (Lehane etaL, 1997; Munks etaL, 2001). Those 
studies were performed exclusively on mRNA from these 
genes. Consequently there is no direct proof that those 
defensin peptides are indeed used in the midgut lumen. 
Also there is mounting evidence that many genes in the 
midgut of blood-sucking insects are posMranscriptionally 
controlled so that meaningful data on gene product levels 
can only come from studies of the peptides themselves 
(Muller etaL, 1995; Lehane etaL, 1998; Noriega & Wells, 
1999).. In this study we set out to use protein-based 
techniques to obtain a clearer view of the biology of the 
S. calcitrans midgut defensins. 
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Results 



A. 



fTecomWnanfprote/ns 

The Pichia system produced low levels of recombinant 
protein; range 0.1-16.7 jag/I for Smd1 and 0.6-6.2 \igf\ for 
Smd2. Mass spectrophotometric analysis (data not shown) 
suggests both recombinant defensins were modified 
compared to the native forms. Recombinant Smd1 was 
truncated losing its first four amino acids. Recombinant 
Smd2 was glycosylated (internal N-linked site at amino 
acid positions 12-14 in the mature peptide) whereas the 
native form is not (Lehane et a/., 1997). Changing expres- 
sion medium from minimal methanol to a medium 
enriched with 1% casamino acids made no difference to 
either modification. Despite these structural changes from 
the native defensins these recombinant proteins pro- 
vided useful positive controls in the antisera work as 
described below. 

Specificity and cross-reactivity of Smd1 and Smd2 
rabbit antisera 

The specificity of the antisera was determined by Western 
blotting (Fig. 1) and ELISA (Fig. 2), the cross-reactivity of 
the two sera with each other was also determined in ELISA 
(Fig. 2). Western blotting with the sera against recombinant 
Smd1 and Smd2 demonstrated that each serum recog- 
nized a single band, at approximately 4 kDa, which corre- 
sponded with the predicted molecular weights of 4736 Da 
for Smd1 (Fig. 1A) and 4237 Da for Smd2 (Fig. 1B) 
(Lehane et al., 1 997). These protein bands were not recog- 
nized by either the control serum or the pre-immune sera 
(Fig. 1C, lanes 1 & 2 and 3 & 4, respectively). 

In ELISA, checkerboard titration of antisera and antigen 
revealed that the optimum serum dilution was 1 :5000 
when an antigen coating concentration of 5 |ig/ml was 
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Figure 1. Specificity of Smd1 and Smd2 antisera. Western blot of 
0.87 pg/lane of recombinant Smd1 (Panels A & C) or Smd2 (Panel B) 
peptide separated on 16.5%T, 3%C Tris-Tricine SDS-PAGE probed with 
A, anti-Smdl sera (MJL3), 1 : 50 dilution (lane 1); 1 : 100 dilution (lane 2); 
1 : 200 dilution (lane 3). B, anti-Smd2 sera (MJL8), 1 : 50 dilution (lane 1); 
1 : 100 dilution (lane 2); 1 : 200 dilution (lane 3). C, control sera (MJL10) 
1 : 50 dilution (lane 1); 1 : 100 dilution (lane 2). Pooled pre-immune sera 
1 : 50 dilution (lane 3) and 1 : 100 dilution (lane 4). 



used (data not shown). ELISA demonstrated that Smd1 
and Smd2 antiserum recognized both the synthetic peptide 
against which they were raised (data not shown) and the 
recombinant Smd1 and Smd2 peptides (Fig. 2). Control 
and pre-immune sera were used to establish baseline 
background activity. In summary, the two sera used in this 
study were sensitive, specific and did not cross-react with 
each other (Figs 1 and 2). 
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Figure 2. Sensitivity, specificity and cross- 
reactivity of Smd1 and Smd2 antisera in ELISA. 
Corrected ELISA optical densities, obtained as 
described in ELISA experimental procedures 
section. Graph 1 : recombinant Smd1 peptide 
probed with Smd1 antiserum (MJL3, black bar), 
Smd2 antiserum (MJL8, white bar), control 
antiserum (MJL10, light grey bar) and pre-immune 
serum (dark grey bar). Graph 2: recombinant 
Smd2 peptide Smd2 antiserum (MJL8, white bar), 
Smd1 antiserum (MJL3, black bar), control 
antiserum (MJL10, light grey bar) and pre-immune 
serum (dark grey bar). 
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Figure 3. Colocalization of defensin and Ssp3 in 
the midgut of Stomoxys caicitrans. Western Wots 
of midgut samples separated on 1 6.5%T, 3%C 
Tris-Tricine SDS-PAGE. A, effect of spiking whole 
reservoir zone homogenate with recombinant 
Smd1: 10 pg homogenate from unfed flies (lane 
1); 10 pg homogenate from unfed flies spiked 
with 2.5 pg recombinant Smd1 peptide (lane 2); 
1 0 pg homogenate from fed flies (24 h post blood 
meal) (lane 3); 10 jig homogenate from fed flies 
spiked with 2.5 pg recombinant Smd1 peptide 
(lane 4). B, presence of Smd1 defensin (= 4 kDa 
band) and Ssp3 (> 26 kDa band) in 50 pg 
reservoir zone homogenate from fed flies (24 h 
post blood meat). C. presence of Ssp3 (> 26 kDa 
band) in 1 0 ug reservoir zone tissue (lane 1 ) and 
Smd1 defensin (-4 kDa band) in 10 pg of 
reservoir zone contents (lane 2) from fed flies 
(24 h post blood meal). 
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SDS-stable complexes with Smd1 defensin 

Western blotting of 10 \ig per lane of reservoir zone 
homogenates revealed the presence of a > 26 kDa band(s) 
when probed with Smd1 antiserum (Fig.3A, lane 1) and 
Smd2 antiserum (data not shown), but did not demonstrate 
the presence of the expected ~4kDa band (defensin). 
When these reservoir zone homogenates (10 ug per lane) 
were spiked with recombinant Smd1 peptide (2.5 ug per 
lane) and probed with Smd1 antiserum, the apparent 
molecular weights of these > 26 kDa bands increased 
(Fig. 3A, lanes 1 & 2). This was consistent for midgut 
homogenates from both unfed (Fig. 3A, lanes 1 & 2) and 
fed flies (Fig. 3A, lanes 3 & 4). In these 'spiking* experi- 
ments we did not observe the presence of the expected 
~ 4 kDa defensin band. When we increased the amount of 
midgut homogenate loaded on to the gels to 50 ug per lane 
and probed the blot with Smd1 antisera we detected the 
presence of both the > 26 kDa band and the = 4 kDa 
defensin band (Fig. 3B). 

After establishing the presence of Smd1 defensin in the 
midgut we repeated the experiment using flies dissected 
24 h after a btood meal, but this time carefully separated 
the reservoir tissue from, the reservoir contents. We now 
found that the > 26 kDa band was only present in the 
reservoir tissue (Fig. 3C, lane 1) whilst the 4 kDa band 
was only present in the gut contents (Fig. 3C, lane 2).These 
results coupled with the results from the 'spiking* experi- 
ment (Fig. 3A) led us to hypothesize that Smd1 peptide is 
able to associate in an SDS-stable complex with this higher 
molecular weight protein. Although no other specific 
defensin-protein SDS-stable complexes are described in 
the literature, other SDS-stable peptide-protein complexes 
are well known from vertebrate studies (Scott et a/., 1999). 

The presence of a > 26 kDa doublet in some of the 
experiments may be due to the detection of a modified form 
of the > 26 kDa protein, for example glycosylation or other 
post-translational modifications, or may simply be due to 



incomplete reduction of disulphide bonds during sample 
preparation. It is also possible that there may be some anti- 
body cross-reactivity, either specific cross-reactivity of a 
similar or related protein or non-specific cross-reactivity. 
However, the observation of an increase in the apparent 
molecular weight of both bands in the doublet after spiking 
with recombinant Smd1 peptide (Fig. 3A) suggests that the 
doublet represents a modified or partially reduced form of 
the > 26 kDa protein. 

Western blotting revealed that Smd1 defensin was only 
detected in the reservoir zone homogenates and was not 
present in other regions of the midgut including the proven- 
triculus, thoracic midgut, opaque zone or lipoid zone (data 
not shown). In addition, Smd1 defensin was not present in 
the fat body, malphigian tubules, brain or thoracic flight 
muscle (data not shown). 

Purification and N-terminal amino acid sequence 

The > 26 kDa protein that complexed with Smd1 was puri- 
fied by repeated excision of the band from SDS-PAGE and 
electro-elution until a single band was revealed by SDS- 
PAGE. The identity of the band was confirmed by Western 
blotting a sub-sample with anti-Smdl serum before obtain- 
ing the following partial N-terminal amino acid sequence by 
Edman sequencing: IVGGNAFAHEGQFPHQVSS. Blast 
searches revealed that amino acids 1-17 of the > 26 kDa 
protein had 76% identity with amino acids 50-66 of serine 
protease SP24D from Anopheles gambiae, suggesting that 
this peptide fragment was from a serine protease. 

Full length clone sequence and alignment with 
other serine proteases 

Using a degenerate sense primer designed from the least 
conserved portions of the Edman sequence and the M13- 
20 primer a full length cDNA was cloned from the library 
(accession number AY044834). It was 881 nucleotides 
long, coding for a prepro-protein of 254 amino acids. The 
predicted molecular weight of the full-length prepro-protein 
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Figure 4. Mean Smd1 and Smd2 levels in adult Stomoxys calcitrans haemo lymph, reservoir zone tissue, reservoir zone lumen, lipoid zone tissue and lipoid 
zone lumen. A t mean and standard errors of Smd1 levels from unfed (UF), sucrose fed (S) and blood fed (B) flies at 24, 48 and 72 h after feeding. Results are 
from five experiments. B, mean and standard errors of Smd2 levels. * indicates a significant difference from haemolymph, reservoir zone tissue and lipoid zone 
tissue (P < 0.05, one-way anova and Fishers pair-wise comparison). 



is 27 523.30 Da with a theoretical pi of 5.85 and charge of 
-4.25. SignalP analysis V 1.1 (Nielsen et a/., 1997) of the 
sequence suggests that the leader sequence consists of 
amino acid residues 1-23, which is cleaved between a 
serine and alanine residue. We suggest the six amino acid 
activation peptide ARPRPR is cleaved from the mature 
protein between the arginine and isoleucine at position 30. 
The theoretical molecular weight, charge and pi of the signal 
peptide would be 2601.40 Da, +0.91 and 8.83, respectively, 
and for the activation peptide they would be 750.90 Da, 
+2.91 and 12.40, respectively. The mature protein has a 
theoretical molecular mass of 24 205.00 Da, a charge of 
-8.25 and a pi of 5.00. 

Blastp was used to search GenBank with the > 26 kDa 
sequence (Altschul et a/., 1997) and this revealed homology 
of the > 26 kDa protein with members of the serine protease 
S1 family (39% identity with chymotrypsin 1 and 37% iden- 
tity with serine protease SP24D from Anopheles gambiae). 
Consequently we have named the protein Stomoxys serine 
protease 3 (Ssp3) (accession number AY044834). 

Localization of Smd1 and Smd2 peptides and Ssp3 mRNA 

The distribution of Smd1 and Smd2 in the adult fly was 
investigated using ELISA (Fig. 4). Low levels of Smd1 and 
Smd2 were present in the reservoir zone tissues in unfed 
and sucrose fed flies. However, in blood fed flies these 
levels increased at least twofold within 24 h of feeding. The 
highest levels of Smd1 and Smd2 were detected in the 
reservoir zone lumen where there was a fortyfold increase 
in Smd1 and a twenty-threefold increase in Smd2 24 h post 
blood meal. Gut lumen defensin levels did not increase in 
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Figure 5. RT-PCR of Ssp3 mRNA. mRNA was prepared from anterior 
midgut (cardiac, thoracic and reservoir zones, lane 1), posterior midgut 
(opaque and lipoid zones, lane 2), fat body (lane 3) and carcass (lane 4). 
Negative control (no mRNA, lane 5) and positive control (Ssp3 clone, 
lane 6). 

response to a sucrose meal. This suggests that Smd1 and 
Smd2 are up-regulated only in response to the blood meal 
and secreted into the midgut lumen. The finding of defensins 
in lipoid zone lumen presumably reflects passage down the 
gut lumen from the reservoir because no defensin mRNA 
is present in lipoid zone tissue (Lehane et a/., 1997; Munks 
etal., 2001). Compared to unfed flies the levels of Smd1 
and Smd2 in the reservoir tissues, but not the lumen, 
remain elevated up to 72 h post blood meal. RT-PCR 
shows that Ssp3 is not restricted to the anterior midgut but 
is found in all tissues tested in the adult fly (Fig. 5). 

Importance of Smd1 and Smd2 in overall midgut 
antimicrobial response 

Zone inhibition assays confirm the presence of antimicrobial 
activity in midgut homogenates (Fig. 6, zone A). Neither the 
control antibody nor isotonic saline had any effect on the 
growth of M. luteus. When midgut homogenates were 
incubated with anti-Smdl antibody there was a 53% reduction 
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Figure 6. Zone inhibition assay using M. luteus as the test organism. 
Samples in wells were 4 midgut equivalents (w/w protein) from A, gut 
homogenate alone; B, gut homogenate incubated with Smd1 antibody; C, 
gut homogenate incubated with Smd2 antibody; D, gut homogenate 
incubated with both Smd1 and Smd2 antibodies; E, gut homogenate 
incubated with control antibody; F, Isotonic saline alone. 

in the zone of inhibition (Fig. 6, zone B), when homogenates 
were incubated with anti-Smd2 antibody there was a 47% 
reduction (Fig. 6, zone C) and when the homogenates 
were incubated with both antibodies there was a 59% 
reduction in antimicrobial activity (Fig. 6, zone D) suggest- 
ing the presence of other anti-Gram positive agents in the 
midgut. We were unable to produce recombinant defensins 
that were exactly the same as the native defensins, there- 
fore we were unable to establish that the concentration of 
antibodies used in these inhibition assays would be suffi- 
cient to inhibit purified defensins. Interestingly, when Smd1 
and Smd2 were inhibited by the antibodies M. luteus 
appeared to encroach back into the inhibition zone sug- 
gesting that the other antimicrobial agents in the midgut 
may be bacteriostatic rather than bacteriocidal or that they 
are relatively unstable. 

Discussion 

Smd1 and 2 peptides are constitutive^ produced but inges- 
tion of the blood meal induces up to a fortyfold increase in 
defensin production in the gut (Fig. 4). The data presented 
here combined with previous data (Lehane etal. t 1997; 
Munks etal., 2001) suggests that production of these 
proteins may be regulated post-transcriptionally. Post- 
transcriptional regulation of genes appears to be a 
common phenomenon in the midgut of blood-sucking 
insects (Muller et a/., 1995; Lehane etaL, 1998; Noriega & 
Wells, 1999) and may be a consequence of the selective 
advantage to haematophagous insects of rapid blood meal 
digestion (Lehane, 1991). 

Although we and others have previously demonstrated 
antimicrobial gene mRNA in insect midgut (Lehane etaL, 
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1997; Dimopoulos etal., 1997; Hao etaL, 2001) the data 
presented (Figs 3C and 4) is the first direct evidence that 
an antimicrobial peptide is secreted into the insect gut 
lumen. The massive response to the blood meal contrasted 
with the weak response to sugar meals (Fig. 4), and the 
high concentration of defensins in the anterior midgut 
(where the undigested blood meal is stored) compared 
to the posterior midgut (where the blood meai is 
digested), supports the conclusion that midgut 
defensins help to protect the blood meal from bacterial 
attack during the 24 h before the meal is fully digested by 
the fly (Lehane etaL, 1997). This conclusion is strength- 
ened by the fact that mRNA for Smd1 and 2 are only found 
in the anterior midgut (Lehane etaL, 1997; Munks etaL, 
2001) and by the rapid decline in defensin proteins in the^ 
gut lumen 24 h post blood meal when the final portions of 
the stored blood meal have been passed through the gut for 
digestion. The data presented (Fig. 6) suggests that these 
two defensins may form only a part, 60% as crudely esti- 
mated by zone assays (Fig. 6, zone D), of the anti-Gram 
positive activity to be found in the anterior midgut. However, 
we were unable to establish that the concentrations of anti- 
bodies used in this assay would fully inhibit purified 
defensins, therefore these results merely lead us to specu- 
late that there is other anti-Gram positive antimicrobial 
activity in the anterior midgut. 

Our data, particularly the spiking experiments (Fig. 3C), 
suggest that while in the reservoir tissues the defensins are 
bound to Ssp3. The presence of a doublet in the spiking 
experiment (Fig. 3A) may reflect the presence of a modified 
form of Ssp3. SDS-stable complexes of this sort are well 
known (Kato etaL, 2001). Once the material is secreted 
into the midgut lumen the two proteins become dissociated 
(Fig. 3B), which suggests that the Ssp3 defensin aggregate 
is not the active unit in the midgut lumen. The discrepancy 
between the apparent molecular weight of Ssp3 at > 26 kDa 
and the predicted molecular weight of 24.205 kDa may be 
due to the presence of bound defensin, possible post- 
translational modifications of the protease, incomplete 
reduction of the protease during sample preparation or any 
combination of these possibilities. At present we do not 
know the mechanism of the interaction(s) between Smd1 
defensin peptide and the Ssp3 serine protease. Indeed the 
number of binding sites available to the defensin on the 
serine protease and whether the defensin can bind to itself 
when associated with the protease remains to be elucidated. 

The full sequence of mature Ssp3 was used to search 
GenBank using Blastp (Altschul etaL, 1997), which pro- 
duced multiple alignments with serine proteases, notably 
insect and vertebrate trypsins and vertebrate elastases. 
These included interesting examples such as Met-ase-1 
(granzyme M) a serine protease from cytolytic granules of 
rat CD3(-) large granular lymphocytes (Kelly etaL, 1996) 
believed to play a part in innate immune responses in 
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vertebrates (Sayers et al., 2001). Inspection of these 
aligned sequences shows Ssp3 contains the catalytic 
triad, His, Asp and Ser in that order and that the highly 
conserved regions surrounding the His and Ser residues 
(which are typical of serine proteases) are also conserved 
(Kraut, 1977). Ssp3 also has the six highly conserved 
cysteine residues at the positions that would allow the 
formation of the three cysteine bonds typical of invertebrate 
serine proteases and differentiating them from the verte- 
brate enzymes, which have four such bonds. We conclude 
that Ssp3 belongs to the peptidase family S1; its particular 
substrate specificity and function needs to be determined 
empirically. We note that blood meal digestion does not 
occur in the reservoir region of the fly where Ssp3 is pro- 
duced suggesting a non-digestive function for Ssp3.This is 
supported by both the widespread distribution of Ssp3 
mRNA in the body (Fig. 5) and by the relatively poor homo- 
logy of this serine protease to the two already described 
digestive serine proteases from S. cafcitrans (Lehane 
et a/., 1998). It is unlikely that Ssp3 is involved in haemoly- 
sis of the blood meal in the reservoir because haemolysis 
does not occur in this zone (M. J. Lehane, unpublished 
observation). We have considered the possibility that Ssp3 
may play a role in the enzymatic activation of defensin from 
the pro- to the mature form, which is known to be a key reg- 
ulatory step in vertebrates (Wilson et al., 1999). It is possi- 
ble that Ssp3 may be involved in similar activity in the fat 
body as it is also present in this tissue. Interestingly the 
cleavage site leading to mature Smd1 is Ala-Ala, which is 
the preferred site of elastase and the homology searches 
suggest good homology between Ssp3 and vertebrate 
elastases.This may be a function of Ssp3. However, enzy- 
matic activation would not require an SDS-stable linkage 
between the two molecules. A possible function of this tight 
association may be the inactivation of either the defensin or 
the trypsin or both while they are within the tissues. Although 
we are not aware of any other examples of defensin-serine 
protease associations other SDS-stable complexes have 
been extensively described for vertebrates. In particular, the 
interactions of serpins with proteases such as the inhibition 
of granulocyte proteases by the intracellular serpin, pro- 
teinase inhibitor 6 (PI-6) (Scott etal., 1999). 

Serine protease has already been immunocytochem- 
ically located in S. calcitrans reservoir zone secretory 
granules (Jordao etal., 1996) and it seems probable that 
the Ssp3 defensin aggregate is localized there. Colocaliza- 
tion of proteases with antibacterial peptides within single 
secretory vesicles is well documented in vertebrates. For 
example, it has been shown in mouse Paneth cell granules 
that defensins (cryptidins) require proteolytic activation by 
the metalloproteinase matrilysin, which is colocalized in 
its granules (Wilson et al., 1999). An example of a different 
type of association is given by the azurophil granule, a 
specialized lysosome of neutrophils. It contains two families 



of antimicrobial proteins, each with four members. The 
defensins, comprising human neutrophil protein 1, -2, -3 
and -4, on the one hand and the serprocidins, comprising 
cathepsin G, elastase, proteinase 3 and azurocidin, on the 
other (Gabay & Almeida, 1993). Interestingly antibacterial 
activity has been reported in an antimicrobial peptide- 
associated serine protease from the midgut cells of the fly 
Sarcophaga peregrina (Tsuji era/., 1998). The activity of 
this molecule is an intrinsic characteristic of the protein not 
related to its protease activity. This protease is found in the 
yellow body, which is formed from primordial adult midgut 
cells in the puparium. In a fascinating parallel with our 
studies it has been found that an antibody to the antibacterial 
peptide sarcotoxin IA (cecropin family) binds to this 26 kDa 
protease (Nakajima etal., 1997). The authors suggest either 
cross reactivity of their antibody with the protease, or that 
sarcotoxin 1 A forms an SDS-stable complex to the 26 kDa 
protease. Our data strongly supports the latter hypothesis 
for the following reason. Smd1 is limited to the reservoir 
region (Lehane etal., 1997; Munks etal., 2001) while Ssp3 
is widely distributed in the tissues of the adult (Fig. 5) 
and Western analysis shows banding only in the reservoir 
region where both are present (Fig. 3B). So evid ence is 
accumulating in insects that antibacterial peptides and 
proteases are colocalized in tissues and, as in vertebrates, 
that the associated protease may also be antibacterial 

Interactions of insect immune molecules with proteases 
deserves further study, particularly if immune systems are 
to be targeted for genetic manipulation in vector-borne 
disease control. 

Experimental procedures 

Insects 

S. calcitrans was cultured as previously described (Blakemore 
etal., 1993). The artificial blood meal (Lehane etal., 1998) and 
sugar meals were made with high purity water (18MQ). 

Production of recombinant defensins 

Recombinant proteins were produced using a commercial Pichia 
pastoris system (InVitrogen). The full Smd1 or Smd2 sequence 
, preceded by the sequence defining the KEX2 cleavable segment 
(Glu-Lys-Arg) of the a-factor mating signal was generated by PCR 
and inserted into the X/iol/SnaB1 or the Xho\/EcoH\ site, respec- 
tively, of the plasmid pPIC9. pPIC9 was linearized with Sa/I and 
transformed into GS115. Pichia was grown in minimum glycerol 
medium and the inserted gene expressed in minimal methanol 
medium. The product was purified by HPLC (Lehane et al., 1997) 
and protein expression levels were determined by the Bradford 
method (Bio-Rad). 

Antibody production 

Unique regions of Smd1 and Smd2 were identified by amino acid 
sequence comparisons and short amino acid sequences (1 Omers) 
were commercially synthesized (MWG-Biotech, Germany). The 
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synthetic peptides were conjugated to bovine thyroglobulin 
(Sigma-Aldrich, UK) via gluteraldehyde (Adrian, 1997). The primary 
immunization of female New Zealand White rabbits consisted of 
100 nmol of conjugated peptide emulsified with complete Freund's 
adjuvant (Difco Laboratories, Michigan USA), 1 ml total volume 
was administered over five subcutaneous (s.c.) sites. Subsequent 
immunizations consisted of 50 nmol of conjugated peptide 
emulsified in incomplete Freund's adjuvant and administered 
over five s.c. sites. Pre-immune serum was collected from each 
rabbit before the immunization regime commenced, a test bleed 
was taken from each animal after it had received the primary 
immunization and two booster injections, which were given at fort- 
nightly intervals. 

Each serum was tested by ELtSA for reactivity to the synthetic 
peptide against which it was raised and was subsequently tested 
against recombinant Smd1 and Smd2 for sensitivity, specificity 
and cross-reactivity in ELISA and Western blotting. Three sera are 
used in this study. MJL3, a serum raised against amino acid num- 
bers 1-10 in the mature Smd1 peptide (AAKPMGITCD), MJL8, a 
serum raised against amino acid sequence numbers 18-27 in the 
mature Smd2 peptide (AHCLLLGKSG) and MJL10, a control 
serum from a rabbit immunized with all of the components used for 
immunizations but without any synthetic peptide. 

Haemofymph collection and antigen preparation 

Haemolymph was collected from twenty-five adult S. calcitrans by 
puncturing the interthoracic membrane with a 0.33 mm diameter 
needle attached to a 1 ml syringe (Micro-Fine +, Becton Dickinson, 
UK) and withdrawing haemolymph from the body cavity taking 
care to avoid contamination of haemolymph with gut tissue or 
contents. Saturated phenylthiocarbamide (2 ul) was added to the 
pooled haemolymph sample to prevent coagulation. Samples 
were centrifuged at 1 2 000 g for 3 min and the supernatant used 
for ELISA. 

Midgut dissection and antigen preparation 

Anterior midguts (proventriculus, thoracic midgut and reservoir), 
reservoir zones alone, opaque zones alone or lipoid zones aione 
were dissected into 154 mw NaCI (pH 7.2), frozen immediately in 
liquid nitrogen and stored at -80 °C until required. Groups of 
twenty-five samples were homogenized in 100 ul of 154 mwi NaCI 
(pH 7.2) and centrifuged at 9000 g for 15 min. The supernatant 
was removed and used in subsequent experiments. Contents of 
the reservoir zone or lipoid zone lumen were collected by removing 
the midgut region as described above and gently applying pres- 
sure along the length of the tissue with a Micro-Fine + needle 
that had been bent at right angles to the syringe barrel. Care was 
taken not to damage the gut tissue, the absence of accidental 
damage was assessed microscopically. The expelled contents 
were collected and pooled from twenty-five flies then stored at 
-80 °C until required. Thawed gut lumen samples were used 
directly in experiments. 

Enzyme-linked immunosorbent assay (ELISA) 

Microtitre plates (Type M29A f F-form, PS microplates, Dynatech, 
West Sussex, UK) were coated with 5 fig /ml of the appropriate 
antigen in carbonate buffer (15 mM sodium carbonate, 35 mM 
sodium hydrogen carbonate, pH 9.6), overnight at 4 °C. Coating 
conditions were determined by checkerboard titration. One 



hundred microlitre volumes per well of antigen solution, primary 
and secondary antibodies and substrate were used throughout. The 
plates were washed three times in PBS + Tween-20 (PBS-Tween; 
145 mM sodium chloride, 2 mM sodium dihydrogen orthophos- 
phate, 4 mM disodium hydrogen phosphate, 0.05% Tween-20, 
pH 7.2) between each step. The plates were blocked with 200 ul/ 
well of 5% skimmed milk powder (Marvel) in PBS-Tween for 2 h at 
ambient temperature. After washing, the plates were incubated 
with the appropriate rabbit antisera (diluted 1 :5000 in PBS- 
Tween) for 2 h at room temperature. After further washing the 
plates were incubated with horseradish peroxidase-labelled goat 
anti-rabbit IgG conjugate (Nordic Immunology, Tilburg, the Nether- 
lands; diluted 1 : 1000 in PBS-Tween) for 2 h at room temperature. 
Following further washing the extent of binding was measured 
colourimetrically after the addition of 300 ul of 2,2'-azino-bis(3- 
ethylbenzthiazoline-6-sulphontc acid) diammonium salt (300 ul of 
a 20-mg/ml solution diluted in 9.7 ml of 2.3% citric acid solution, 
pH 4.0) in the presence of hydrogen peroxide (10 \i\ of 30% v/v 
H 2 0 2 ). Optical density was measured in an ELISA plate reader 
(TrtertekTwinreader, version 2.01) at wavelength 405 nm (OD 405mn ). 
The optical density of each antigen preparation was measured in 
triplicate for each antibody used. 

In order to account for day-to-day and plate-to-plate variability 
and the different volumes of tissues and/or their contents analysed 
in this study, two calculations were used. The first was a correction 
factor that allowed direct comparison between plates, the second 
was the expression of corrected OD 405nm values as units per tissue 
per fly (UTF). 

Correction factor calculation 

Each plate contained positive reference peptides (against which 
the antisera were raised). If the mean optical density of positive 
reference wells was not exactly 1 .0, the correction factor was applied. 
This correction factor was 1 .0/mean optical density of reference 
positive wells. The mean OD 405nm for each antigen preparation was 
then multiplied by the correction factor obtained for that antibody. 

Expression of units per tissue per fly (UTF) 

The equation for this was: xy/5, where x = volume of sample 
collected and y = total protein expressed as ucj/uJ. 

The corrected optical density was multiplied by the resulting 
factor, which was then divided by the number of flies used in the 
group (typically twenty-five) thus giving UTF. UTFs were also calcul- 
ated for the control sera and subtracted from the UTFs obtained for 
anti-Smdl and anti-Smd2 sera. 

Tris-tricine SDS-PAGE and Western blotting 

Proteins diluted in sample buffer (2 x sample buffer: 0.1 MTris, 4% 
SDS, 5% 2-mercaptoethanol, 0.01% Coomassie blue G250, 
pH 6.8) and separated in Tris-tricine SDS-PAGE (Schagger & von 
Jagow, 1987) using a 16.5%T/3%C separating gel, a 10%T/ 
3%C spacer gel and a 4%T/3%C stacking gei. Polypeptide 
molecular weight markers (Bio-Rad, UK) were simultaneously 
electrophoresed. After electrophoresis protein samples were 
stained with Coomassie Blue, silver stain or electrotransferred on 
to nitrocellulose paper (NCP) for Western blotting. Proteins separ- 
ated by SDS-PAGE were electrotransferred to NCP (Hybond-C 
extra, Amersham Life Sciences, UK) according to previously 
published methods (Towbin et ai, 1979). After transfer, the marker 
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lanes were removed and stained with a colloidal gold solution 
(Protogold, British BioCelt International) according to the manu- 
facturers instructions. The remainder of the blots were blocked in 
5% skimmed milk powder (Marvel) in Tween transblotting solution 
(TTBS, 20 mMTris, 0.9% NaCi, 0.1%Tween-20, pH 7.2) for 2 h at 
ambient temperature. The blots were washed (three times for 
5 min) in TTBS, cut into strips and probed with the appropriate 
rabbit antisera (diluted 1 : 75 in TTBS) for 2 h at room temperature. 
After subsequent washing (three times for 5 min in TTBS) the blots 
were incubated in horseradish peroxidase-labelled goat anti-rabbit 
IgG (diluted 1 :1000 in TTBS, Nordic Immunology, Tilburg, the 
Netherlands) for 2 h at room temperature. After further washing 
the blots were developed by the addition of 4,chloro-1-naphthol 
(20 mg in 4 ml of methanol) in 16 ml of trans-blotting solution 
(TBS, 20 mMTris, 0.9% NaCI, pH 7.2) containing hydrogen perox- 
ide (10 ul of 30% v/v H 2 0 2 ).The reaction was stopped by immers- 
ing the blots in distilled water. 

Purification of > 26 kDa band and N-terminaf sequencing 

The > 26 kDa band was purified by preparative tris-tricine SDS- 
PAGE followed by excision of the correct band and electro elution 
of the protein of interest. During each PAGE run the ends of the 
protein band were cut from the gel and Western blots were carried 
out on each end of the gel with Smd1 antiserum. This enabled us 
to monitor that we had the correct band and allowed us to align the 
blotted ends with the gel in order to excise the band of interest. 
When we had a single band on tris-tricine SDS-PAGE we con- 
firmed its identity by blotting and excised and electro-eluted the 
protein again. The purified protein was then subjected to tris-tricine 
SDS-PAGE and electro-transferred on to polyvinylidene fluoride 
membrane (PVDF membrane, Sigma) in glycine-free CAPS buffer, 
pH 1 1 .0. The membrane was stained with Coomassie blue and 
sent for commercial sequencing (Alta Biosciences, University of 
Birmingham, UK). One amino acid was observed per sequence 
cycle. 

cDNA library, cloning and sequencing 

A S. calcitrans adult midgut specific cDNA library, estimated to 
contain 1 .4 x 1 0 6 individual clones, was constructed in Lambda 
ZAP (Stratagene) according to the manufacturers instructions. 
Eight hundred midguts were used to make the library. The library 
was plated using E. coli XL-1 Blue. A degenerate sense primer 
was constructed (5'-GGACAATCYCCXCAYCA-3') based on the 
Edman sequence information for the > 26 kDa protein. This 
degenerate primer and a universal M 1 3-20 primer were then used 
in PCR with the midgut specific cDNA library to generate a 32 P- 
labelled probe for screening the library. pBluescript phagemids 
were excised in vivo from the lambda vector using ExAssist helper 
phage and plated using E.coli XLOLR (Stratagene). DNA 
sequencing was carried out using a Beckman CEQ 2000XL capil- 
lary sequencer. 

RT-PCR 

The primers 5'-CATTGCTACTGGACCAGA-3' and 
5'-GGACAATTTCCTCACCA-3' were designed from regions of the 
> 26 kDa protease gene that are not highly conserved in homolo- 
gous sequences selected by Blastx. Poly A + RNA was extracted 
from fifteen anterior midguts (cardiac, thoracic and reservoir 
zones), fifteen posterior midguts (opaque and lipoid zones), six fat 



bodies and the remains of two carcasses (approximately equal wet 
weights of tissue) from adult S. calcitrans using the Dynabeads 
system. One tenth of the extract was used in RT-PCR using the 
Access RT-PCR system (Promega) with one cycle of 45 min at 
48 °C r one cycle of 2 min at 94 °C, forty cycles of 30 s at 94 °C, 
1 min at 60 °C and 2 min at 68 C C. Finally we performed one cycle 
for 7 min at 68 °C. Controls omitting reverse transcriptase were 
used to check for genomic DNA contamination. 

Zone inhibition assays 

Antibacterial activity was estimated using zone inhibition assays 
utilizing either Micrococcus luteusor E. coli D31 (Lehane et al., 1997). 
Diameters of zones were recorded following 24 h incubation 
at either 28 °C for M. luteus or 37 °C for E. coli D31 . Midgut homo- 
genates were filtered through a 0.2 urn nylon filter (Whatman) 
by centrifuging at 25 000 g for 15 min to remove any contaminat- 
ing bacteria that possess antimicrobial activity (e.g. Serratia 
marcescens\ J. V. Hamilton, unpublished results). Samples con- 
taining 4 midgut equivalents were incubated with either 0.9% 
saline or antibody (diluted in saline) for 15 min at ambient temper- 
ature before loading into the wells of the agarose plate. The plates 
were incubated at 30 °C for 16-24 h and the zones of inhibition 
measured. 
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The 99 residue human immunodeficiency virus type 1 proteinase has been expressed in 'ZscfUricfiia 
cod as part of an autocleaving fusion protein. Expression of the fusion protein is toxic to the host 
cells, however yields of the released proteinase have been improved by optimising induction and 
harvest times to increase culture biomass, and decrease degradation of the proteinase. Soluble 
proteinase was extracted from these cells by a simple and highly efficient three step process. N- 
terminal sequence analysis confirms that the enzyme preparation is highly pure and correctly 
autoprocessed. The proteinase cleaves peptide substrate IGCTLNFPISPIETV between F and P at 
pH 6.0 with a Km of 310jxM and a Kcat of 14s" 1 . The enzyme is sensitive to its ionic environment, 
showing stimulation of activity at high salt concentrations, and shows a pH optimising 5.5. c 1991 
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The role of human immunodeficiency virus type 1 (HIV-1) in the onset of acquired 
immunodeficiency syndrome (AIDS) has been clearly demonstrated [1,2]. The HIV-1 proteins 
encoded by GAG, POL and ENV genes are expressed as precursor molecules which are then 
processed by proteolytic cleavage to yield the structural proteins and enzymes required for viral 
replication [3]. Processing of the GAG and GAG-POL precursor proteins is mediated by a virally 
encoded proteinase [4]. This 99 amino acid residue proteinase, encoded within the 5' end of the 
POL open reading frame [5], is released from the GAG-POL precursor by an autocatalytic 
processing event which cleaves at two flanking Phe-Pro sites [6-11]. The HIV-1 proteinase, like 
those encoded by other retroviruses, shows homology with the N- and C-terminal domains of the 
family of aspartyl proteinases [ 12- 14], particularly within the active site region Asp-Thr-Gly motif. 
The sensitivity of the enzyme to pepstatin [7, 15-17] confirms that this is a true member of the 
aspartyl proteinase family. The I lkDa POL proteinase contains only one half of the residues which 
contribute to the classical aspartyl proteinase active site, however the recently determined crystal 
structures of the proteinases from Rous sarcoma virus and HIV-1 show that they form dimeric 

ABBREVIATIONS ' IPTG, isopropyl-^D-thiogalactopyranoside; EPNP, l,2-epoxy-(4-nitrophenoxy) 
propane; DAN, N-diazoacetylnorleucine; MES, 4-morpholine ethane sulphonic acid. 
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structures [18-20]. Processing of the GAG and POL precursors is specifically prevented by 
mutation of the proteinase active site aspartate (POL residue Asp93) to alanine, asparagine or 
threonine [6,10,16,21]. The latter mutation made in a proviral construct resulted in the productionof 
immature non-infectious particles, thus showing that the HIV-1 proteinase is essential for viral 
reproduction and therefore a valid target for viral chemotherapy. 

In our initial attempts to express the HIV-1 proteinase in T.cod we were unable to detect the 
proteinase by Western blot. This was almost certainly due to the rapid arrest of culture growth, 
upon induction of proteinase expression, preventing accumulation of the proteinase. In order to 
increase the expression we have fused the proteinase to the highly expressed chloramphenicol acetyl 
transferase (CAT) gene in the plasmid pTCX2TT [24] and altered the induction and harvest times to 
increase production of the proteinase to levels that allow purification of the enzyme to near 
homogeneity. 

MATERIAL AND METHODS 
Plasmid Construction 

M13mpl 1 was modified by insertion of a synthetic linker (Figure la) into the multiple cloning site. 
The BglH-EcoRI POL fragment of lambda phage clone BH10[5] was inserted between the BaraHI 
and EcoRI sites of the modified Ml 3 vector. Translation termination codons and a Sail site were 
introduced at the 3* end of the predicted POL proteinase coding sequence by SDM. These 
manipulations created on Ncol-Sall flanked sequence (designated MatP) encoding residues 1-167 of 
the POL orf\ preceded by a Met-Gly dipeptide. The Ncol-Sall MatP fragment was fused with the 5* 
end of the CAT gene in plasmid pTCX2TT [24] via a synthetic EcoRI-Ncol linker (5'- 
AATTCGGATCCAACGG-3* and 5'-CATGCCGTTGGATCCG-3'). The resultant plasmid 
pTCEMatPl (Figure lc) encodes a 27KDa CAT-MatP fusion protein under control of the TPTG 
inducible "Toe promoter. Site directed mutagenesis was used toxhange the proteinase active site 
residue Asp93 (GAT) to Asn (AAT). The mutated form of MatP (designated MatP(D93N)) was 
fused to the CAT gene as described above to generate plasmid pTCEMatPl(D93N). Cultures of 
X.a;6"RB791 carrying the expression plasmids were grown in minimal media as described [24]. 

Proteinase Extraction, Purification and Analysis 

Cells were harvested from 10 one litre, shake flask, cultures by centrifugation at 10,000g for 20 
minutes at 4 * C. The proteinase was extracted from the cells, essentially as described [7]. Cells 
were resuspended at 2g (wet weight)/ml in buffer A (50mM MES, 2mM EDTA, 0.01% Triton X- 
100, pH 6.0) then extensively sonicated. The total cell lysate was diluted with 10 vol/vol cold 
acetone, left at -70' C for 30 minutes then centrifuged at 10,000g for 30 minutes. The pellet was 
dried under nitrogen and stored at -70 * C umil required. It was solubilised with buffer A containing 
0.5M NaCl and 40% glycerol and centrifuged at 100,000g for 30 minutes. The supernatant was 
diluted 10 fold into buffer B (50mM MES. 2mM EDTA, pH 6.0) and immediately loaded, at 
4ml/min, onto a Mono S (HR10/10) column. The enzymic activity was recovered by a 0-1M NaCl 
gradient (0-0.2M in 48ml, isocratjc at 0.2M to 96ml, 0.2-0.6M to 224ral and 0.6-1.0M to 272ml) in 
buffer B. The active fractions were pooled and diluted with equal volume of 3.2M ammonium 
sulphate in buffer B. This was applied to the alkyl-superose (HR 5/5) column at 1.0ml/min and 
eluted, at 0.5miymin, by applying a 1.6-0M ammonium sulphate gradient (1.6-OM in 4ml and OM 
isocratic to 6.0ml) in buffer B. 1ml fractions were collected into tubes containing glycerol (50% 
v/v) and kept in ice. The pooled material was then applied to a gel permeation column (2.6cm x 
70cm) containing superdex G75 matrix. The column was eluted with buffer B containing 0.5M 
NaCl and 10% glycerol. Fractions containing proteinase activity were pooled, the glycerol 
concentration raised to 50%, and stored at -70 'C. Proteinase activity was determined by incubating 
l-5ol of protein (0.05-2 ug) with 150uM substrate peptide (IGCTLNFPISP1ETV) in buffer A 
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containing 2M NaCl at 37' C for up to 5 minutes in a final volume of 200ul. The reaction was 
terminated by adding 4*zl of 80% acetic acid and boiling the mixture for 5 minutes. After 
centrifugation at 10,000g for 5 minutes, 100*;! of the sample was applied to a reverse phase 
chromatography (PepRpc) column running at l.Oml/min. Both substrate and products were resolved 
by an elution gradient of 0.) % TFA/water and 0.1% TFA/acetonitrile (0-15% acetonitrile in 1.5ml, 
isocratic at 15% to 3.5ml, 15-40% to 13.5ml and isocratic at 40% 14.5ml) and quantified by 
integration of the peak areas at 214nm. The kinetic data was analysed using the ENZFITTER 
software package. 

N-terminal sequence analysis was by automated Edman degradation using a gas phase sequencer 
(Applied Biosystems Model 477A). Identification and quantification of the phenylthiohydantoin 
derivalives was performed on line using an Applied Biosystems Model I20A PTH amino analyser 
with external quantification. The protein sample (200pmole in 50% acetonitrile, 0.1%TFA) was 
loaded on a pre-cycled polybrene filter and used for 15 cycles of sequencing. Protein concentration 
in the crude extract and the partially purified enzyme preparation was determined using the 
Bicinchoninic acid reagent [25]. Purified enzyme protein concentration was determined from its 
absorbance at 280nm using a value of E 1 % = 11.4 which was calculated from the extinction 
coefficients of both Trp and Tyr residues. 

RESULTS AND DISCUSSION 

The CAT-H1V-1 proteinase gene fusion vector pTCEmatPl (Fig lc) encodes a 27kOa fusion 
protein containing the N-terminal 72 residues of CAT and the first 167 residues of the HIV-1 POL 
open reading frame, including the 99 residue proteinase, and the autocleavage site (Phe 68-Phe 69) 
at its N-terminus. The fusion protein was predicted to be capable of autocleavage to release the 
mature 1 IkPa proteinase. Induction of fusion protein expression resulted in arrest of culture growth 
(Fig 2). Cultures induced I to 4 hours after inoculation ceased to grow within 2-3 hours. Induction 
after 5 hours coinsided with the onset of stationary phase growth of the uninduced control culture. 
The specific proteinase activity in the soluble extract of cultures induced after 1 to 4 hours was 
similar, but approximately two fold higher than that observed in soluble extracts of the culture 
induced after 5 hours. No growth retardation was observed in cultures induced to express the 
mutated fusion protein from pTCEmatPl (D93N), and only background proteinase activity was 
detected in the soluble fractions of these extracts. 

SDS-PAGE analysis was carried out on whole cell extracts from cultures of pTCEmatPl and 
pTCEmatPl (D93N) induced after 4 hours (A 55 q of 1.4-1.6) and harvested 30 minutes after 
induction. No specifically induced bands were visible on Coomassie blue stained gels of the 
pTCEmatPl extracts (Fig 3a). However, extracts of pTCEMatPl{D93N) cultures revealed a 
specifically induced protein of 27kDa, which corresponds to the predicted size of the fusion protein. 
Western blot analysis with the proteinase antisera (Fig 3b) showed that both plasmids inducibly 
express an immunoreactive 27kDa protein, although it is present in greater quantity in the 
pTCEmatPl (D93N) extract, as would be expected from the stained gel result. A second 
immunoreactive protein, which appears exclusively in the induced pTCEmatPl(D93N) extract with 
a molecular weight of approximately 24kDa. is thought to be a degradation product from the 27kDa 
fusion protein. An induced immunoreactive protein of 1 lkDa was found in the pTCEmatPl extracts 
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EcoRI* Ncol BaitiHI EcoRI Sail 

a 

5 1 AATTGCCATGGGATTTTTTAGGGAGGATCCTCTAGATGAATTCCCTAATAACTAACTAAG 3 
CGGTACCCTAAAAAATCCCTCCTAGGAGATCTACTTAAGGGATTATTGATTGATTCAGCT 

MetGlyPhePheArgGluAsp IleProEndEnd End End 




CAT (1-73) pol (1-167) 



Figurel. a) Synthetic linker used lo modify M13 mpll prior to insertion of Bg II I-EcoRl 
POL fragment. 

b) The relationship of the Ncol-Sall matP gene fragment to the HIV-I GAG and 
POL open reading frames. The relevant region of the amino acid sequence 
encoded by MatP are indicated (numbering corresponds to the POL orf)- 

c) Map of plasmid pTCEmatPl and a schematic of the fusion protein it encodes. 
The Phe-Pro autoprocessing site at the N terminus of the 99 residue protease is 
indicated in the fusion protein. 



but not in the pTCEmatPl(D93N) extracts and was therefore believed to be the released 99 residue 
H1V-1 proteinase. These results indicate that the fusion protein expressed from pTCEmatPl is 
processed to release the llkDa proteinase, probably by cleavage at the Phe68-Pro69 junction, 
whereas the 27kDa fusion protein containing the mutated proteinase from pTCEmatPl(D93N) does 
not release the 1 lkDa protein, but accumulates in the cell. This latter result suggests that the 
specific processing event is carried out by the HIV-1 proteinase rather than an %.co& proteinase. We 
would normally expect the two gene fusion to be expressed at the same rate since they differ by a 
single base pair, however, while the mutated fusion protein is visible on Coomassie blue stained gel, 
the detectable levels of unmutated fusion protein and released proteinase are considerably lower. 
This probably reflects the decreased growth rate of cultures induced to express the proteinase fusion 
protein (Fig 2) noted above. The toxic effect and low yield of HIV-1 proteinase in T.eo(i has been 
reported previously [ 1 6,26] and utilised as a screen for inactivating mutations in the proteinase gene 
(26]. The poor accumulation of released proteinase is probably ihe result of both reduced expression 
of the fusion protein and increased degradation of the released proteinase and/or fusion protein in 
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To assess the effect of expression from pTCEmatPl in %xo(i RB791 a series of 
cultures were induced with IPTG (ImM) at different times after inoculation as 
indicated by arrows (#No IPTG). Culture growth was monitored at 550nm 
and cultures were harvested 2 hours after induction. 



these stressed eells. By sampling induced cultures at intervals post-induction we found that 
accumulation of the I lkDa proteinase peaks 15-45 minutes after addition of IPTG (data not shown). 
This corresponds to a period when the culture is still growing relatively rapidly (Fig 2). 

To maximise accumulation of the released proteinase, larger scale cultures, for purification of the 

r 

UkDa enzyme were therefore induced at an absorbance of 1.4-1.6 and harvested within 45 minutes. 
lOg of cells harvested from shake flask cultures were used for the purification of the released 
proteinase. Acetone precipitation, detergent solubilisation and the subsequent centrifugation 
resulted in the removal of a large amount of cellular debris as well as the lipid. The extract was 
diluted 10 fold before applying it to the cation exchange column. Elution with a NaCl gradient 
resulted in over 80% of the activity being displaced at around 0.3M NaCl (Fig 4a). Fractions 
containing proteinase activity were pooled and then diluted (1:1) with buffer A containing 3.4M 
ammonium sulphate and loaded on the alkyl-Superose matrix. A considerable amount of UV 
absorbing material did not bind to this column and after washing with 10 column volumes, the 
column was developed with a decreasing linear gradient of ammonium sulphate (1.7-0.OM). 100% 
of the activity loaded was recovered in fractions 4 and 5 (Fig 4b). These were pooled and further 
chromatographed on the Superdex G75 gel permeation column. The enzyme appeared to elute with 
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Figure 3 . Gel analysis of whole cell extracts from pTCEmaiPl and pTCEmatPl (D93N) 
cultures + and - induction with IPTG. Extracts were prepared from l-2rnl 
culture. CeU pellets were re suspended in 1 x gel loading buffer at 10 A 55Q 
units per ml, boiled for 5 mins and 10*xl loaded for analysis on a) Coomassie 
blue stained 20% SDS-PACE [33] and b) Western blot immuno-stained [34] 
with proteinase specific peptide antisera- Rabbit antisera was raised to 
peptides corresponding to HIV-1 POL residues 93-99 and 107-1 13. Arrows 
indicate induced bands. Sizes of molecular weight markers are indicated in 
kDa. 



a retention volume consistent with it existing as a monomer of near lOkDa in size (Fig 4c), with 
>80% of the enzymic activity being recovered. The total amount of protein recovered after this final 
Step was 200 ug and the specific activity was determined to be 45umoIAnin/mg of protein. Increase 
in NaCl concentration to 2M resulted in the enzymic activity eluting from the Superdex G75 column 
at a volume consistent with it being dimeric. However, the separation was carried out at 0.5M Nad 
to remove larger contaminating proteins. This effect of increasing ionic strength on the enzyme 
dimerisation and catalysis is discussed elsewhere [27], 

Fraction containing proteinase activity were pooled and re-chromatographed on the alkyl-superose 
matrix to concentrate the enzyme. SDS-PAGE showed that the preparation was over 90% pure (Fig 
5). The amino acid analysis (Table 1) was consistent with the predicted composition of the 
proteinase. 15 cycles of N-terminal analysis by gas phase sequencing showed the following 
sequence PQITLWQRPL V1TK , consistent with the known proteinase sequence and indicating that 
the proteinase is released by cleavage at the predicted F-P cleavage site. The purified enzyme was 
stored at -70* C in buffer at pH 6.0. Autodegredation upon storage has been reported [28J but in this 
case the enzyme was stored in acetic acid (pH 4.0). The pH/activity profile displayed by our 
proteinase (Fig 6) is similar to that reported [18,29]. Maximum activity was detected at pH 5.5. The 
Km for the peptide, 1GCTLNFPISPIETV, increases from 1 80*uM at pH 5.0 to >lmM at pH 7.0. The 
effect of varying the pH on the inhibition with acetyl -pepstatin and the renin inhibitor H-261 has 
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Column Chromatography used to purify the released H1V-1 proteinase. These 
columns were loaded and run as described in Materials and Methods. 



a) FPLC mono S (HR 10/10) column profile. 20ml fractions were collected. 
Fractions 7 to 12 (elution volume 140 to 240ml) contained >80% of the 
detected proteinase activity. 

b) Alkyl-superose (HR 5/5) column profile. The proteinase activity recovered 
from this column corresponded to the major protein peak (elution volume 5 to 
8ml). 

c) Supcrdcx G75 column profile. This column was calibrated with BioRad gel 
filtration standards. The elution volume of these standards are indicated 
(kDa). All the proteinase activity eluted with the major peak between 1.3 and 
17kDa. 



also been reported [17]. Interestingly, increasing the pH from 4.7 to 7.0 produced a decreased 
binding of both compounds, with Ki for the first increasing 50 fold and the second only 5 fold. This 
dramatic effect may be the result of the ionisation property of the His residue in H-261 and the C- 
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Figure 5. 



Silver stained 20% SDSPAGE gel of purified HIV-1 proteinase. Lane a. 
molecular weight markers (kDa) Lane b. 300ng purified proteinase. The 
purified proteinase was TCA precipitated and washed with ethano) prior to 
electrophoresis [33] and silver staining [35]. 



terminus carboxyl group of acetyl-pepstatin. Hence the ionisation of the substrate or inhibitor as 
well as that of the catalytic Asp residues may have a major influence on their binding. The purified 
enzyme has been used to compare the cleavage of peptides containing the Y-P and M-M sites. The 
Km values, at pH 6.0, for peptides VSQNYPIVQNIG. ATMMQRG and IGCTLNFPISPIETV were 
230, 504 and 312jiM respectively. These results were obtained under conditions which displayed 
near maximal proteinase activity ie. the enzyme was assayed in the presence of 2M NaCl. The Km 
values, whilst reflecting slightly different peptide sequences, are at least an order of magnitude lower 
than those previously reported [30,31]. 

From the kinetic analysis of peptide IGCTLNFPISPIETV hydrolysis, a turnover number of near 
14s" 1 was calculated. Incubation of our purified enzyme with lOmM EPNP or DAN (EPNP and 
DAN are Asp modifying reagents) and O.lmM Cu? + resulted in both cases in 50% inhibition of the 
activity. Cu^ alone caused 20% inhibition. This behaviour is consistent with HIV proteinase 
belonging to the aspartic proteinase family. ImM iodoacetic acid and p-chloromercuribenzoate 
(thiol modifying reagents) each caused 75-80% inhibition. There are two thiol groups in the 99 
residue proteinase, at position 66 and 95. Cys 95 appears buried at the dimerisation interface [20], 
therefore it is likely that modification of its thiol group results in the near complete abolition of 
proteinase activity by preventing dimerisation. This suggests that the subunits are in dynamic 
equilibrium under the experimental conditions, since Cys 95 would not be available for modification 
in the dimer. 

CONCLUSION 

We have overcome the toxic effect of expressing HIV-1 proteinase in x.toliby using the highly 
expressed CAT gene to rapidly produce the proteinase as an autoprocessing fusion protein and 



791 



Vol. 175, No. 3, 1991 BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS 



Table 1 Amino acid composition analysis of the purified proteinase. The protein was concentrated, 
and the buffer constituents removed by reverse phase chromatography (ProRpc HR 5/10). The 
protein was eluted by applying a linear gradient (0-100%) of 0.1% TFA/water and 0.1% 
TFA/acetonitrile. The material eluting at around 52-54% acetonitrile was freeze dried and 
hydrolysed in the HC1 vapour phase using a Waters Pico Tag Workstation. Phenylisothiocyanate 
(PITC) derivatives, with aminobutyric acid as internal standard, were separated by reverse phase 
chromatography essentially using the manufacturers instructions (Waters-Millipore, UK). 
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altering induction and harvest times to maximise accumulation of the released proteinase. The 
enzyme has purified to near homogeneity and is found to be highly active against various peptide 
substrates containing predicted HIV cleavage sites. The enzyme is inhibited by DAN, EPNP, IVA- 
pepstarin, acetyl-pepstatin and acetyl-pepstatinamide. Its sensitivity to iodoacetic acid and p- 
chloromercurial benzoate may well be due to the presence of a thiol group (Cys95) buried at the 
dimerisation interface which becomes modified when exposed in the free monomer. The 
modification of the thiol group in HIV proteinase may therefore result in destabilisation of 
dimerisation. Kinetic analysis of the cleavage of peptide IGCTLNFPISPIETV demonstrates that the 
purified enzyme is efficient with a Kcat/Km of around K^M'V 1 . Variations in the kinetic 
constants reported by others have been discussed [29]. It is likely that different assay conditions, pH 
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Figure 6. The effect of pH on the initial activity (V^ of ihe purified HIV-1 proteinase. 

The enzyme was assayed as described in Materials and Methods with 2mM 
EDTA in the reaction mixture buffered with 50mM glycine- HCI (pH 3.0-3.5), 
50mM Na-acetate (pH 4.0-4.5), 50mM Mes (pH 5.0-7.0) or 50mM Tris-HCI 
(pH 7.5-8.5). 150jiM peptide IGCTLNFPISPIETV was used as substrate. 

and ionic strength have a strong influence on this. Any analysis of the effect of ionic strength on the 
catalysis in terms of increased binding of substrate [32] must also take into account the influence of 
ionic strength or pH on dimerisation [27]. 

ACKNOWLEDGMENTS 

We would like to thank Ms J Sparks, Ms E M J Roud-Mayne and Mr A C Marshall for technical 
assistance. Dr G Turcatti and Dr P Wingfield (Glaxo Institute of Molecular Biology, Geneva, 
Switzerland) for N-terminal sequencing. Mr B A Coomber, Dr C Christodoulou, Dr J Kitchen and 
DrP Seal (GGR, Greenford) for in house synthesis of oligonucleotides and peptides. 

REFERENCES 

1. Barre-Sinoussi, F., Cherraann, J.C., Rey, F., Nugeyre, M.T., Channeret, S., Gruest, J., 
Dauguet, C, Axler-Blin, C, Vezinet-Brun, F. ( Rouzioux, C, Rozenbaum, W. and 
Montagnier, L. (1983) Science, 220, 868-870. 

2. Gallo, R.C., Salahuddin, S.Z., Popvic, M., Shearer, G.M., Kaplan, M., Haynes, B.F., Palker, 
T.J., Redfield, R., Oleske, J., Safai, B., White, G., Foster, P. and Markham, P. (1984) 
Science, 224, 500-502. 

3. Coffin, J.M. (1982) in Molecular Biology of Tumor Viruses, RNA Tumor viruses, Eds 
Weiss, R., Teich, N., Varmus, H., and Coffin, J. (Cold Spring Harbor Lab. Cold Spring 
Harbor. NY) pp 261-368. 

4. Wong-Staal, F., and Gallo, R.C. (1985) Nature, 317, 395-403. 

5. Ratner, L. t Haseltine, W., Patarca, R„ Livak, K.J., Starcich, B., Josephs, S.F., Doran, E.R., 
Rafalski, J. A., Whitehorn, E.A., Baumeister, K., Ivanoff, L„ Petteway, S.R. Jr., Pearson, 
MJL., Lautenberger, J. A., Papas, T.S., Ghrayeb, J., Chang, N.T., Gallo, R.C., and Wong- 
Staal, F. (1985) Nature 313, 277-284. 



793 



% ♦ 

» 



Vol. 175, No. 3, 1991 BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS 

6. Mous, J., Heimer. E.P., & Le Grice, S.FJ. (1988) J. Virol 62 1433-1436. 

7. Hansen. J., BiUich, S., Shuize, T. t Sukrow, S and Moelling, K. (1988) EMBO J. 7, 1785- 
1791. 

8. Debouck, C, Gomiak, J.G., Stickler, J.E., Meek, T.D., Metcalf, B.W. and Rosenberg, M. 
(1987) Proc. NaU. Acad. Sci. USA, 84, 8903-8906. 

9. Lillehoj, E.P., Salazar, F.H.R., Mervis, R.J., Raum, M.G., Chan, H.W., Ahmad, N, and 
Vankatesan, S. (1988) J. Virol. 62 3053-3058. 

10. Le Grice, S.F.J., Mills. J. and Mous, J. (1988) EMBO J. 7, 2547-2553. 

11. Graves, M.C, Lim, J.J., Heimer, E.P. and Kramer, R.A. (1988) Proc. Nail. Acad. Sci. USA 
85, 2249-2453. 

12. Toh, H., Ono, M., Saigo, K. and Miyata, T. (1985) Nature 315, 691. 

13. Peari, L.H. and Taylor, W.R. (1987) Nature 328, 482 

14. Pearl, L.H. and Taylor, W.R. (1987) Nature 329, 351-354. 

15. Seelmeier. S., Schmidt, H., Turk, V. and Von Der Helm, K. (1988) Proc. Nad. Acad. Sci. 
USA 85,6612-6616. 

16. Darke, P.L., Leu. C-T., Davis, L.J., Heimbach, J.C, Diehl, R.E., HiU, W.S., Dixon, R.A.F. 
and Sigal, I.S. (1989) 264, 2307-2312. 

17. Miller, M., Sathynarayanar, B.K., Wlodawer, A., Toth, M.V., Marshall, G.R., Clawson, L., 
Selk, L., Schneider, J. and Kent, S.B.H. (1989) Science 246, 1 149-1 152. 

18. Richards, A.D., Roberts, R., Dunn, B.M., Graves, M.C. and Kay, J. (1989) FEBS Letts. 247, 
113-117. 

19. Miller, M., Jaskoliski, M., Rao, M.J.K., Leis, J. and Wlodawer, A. (1989) Nature 337, 576- 
579. 

20. Navia, M., Fitzgerald, P.M.D., McKeever, B.M., Leu, C-T., Heimbach, J.C., Herber, W.K., 
Sigal, I.S., Darke, Pi., and Springer, J.P. (1989) Nature 337, 615-620. 

21. Kohl, N.E., Emini, E.A., Schleif, W.A., Davis, LJ., Heimback, J.C., Dixon, R.A.F., 
Scolnick, E.M. and Sigal, l.S. (1988) Proc. Nad. Acad. Sci. USA 85, 4686-4690. 

22. Schneider, J and Kent, S.B.H. (1988) Cell 54, 363-368. 

23. Nutt, R.F., Brady, S.F., Darke, P.L., Ciccarone, T.M., Colton, CD., Nult, E.M., Rodkey, 
J.A., Bennett, CD., Waxman, L.H., Sigal, I.S., Anderson, P.S. and Veber, D.F. (1988) Proc. 
Natl. Acad. Sci. USA 85, 7129-7133. 

24. Dykes, C.W., Bookless, A.B., Coomber, B.A., Noble, S.A., Humber, D.C. and Hobden, 
A.N. (1988) Eur. J. Biochem. 174, 411-416. 

25. Smith, P.K., Krohn, R.I., Hermanson, G.T., Mallia, A.C., Gartner, F.H., Provenzano, M.D., 
Fuyimoto, E.K., Goeke, N.M., Olson, B.J. and Klenk, D.C. (1985) Anal. Biochem. 150, 76- 
85. 

26. Baum, E.Z., Beberaitz, G.A. and Gluzman, Y. (1990) Proc. Natl. Acad. Sci. USA 87, 5573- 
5577. 

27. Singh, O.M.P., Roud-Mayne, E.MJ. and Weir, M.P. (1990) in Retroviral Proteases: Control 
of Maturation and Morphogenesis. Ed. LH Pearl, McMillan Press UK in press . 

28. Stickler, J.E.,Gormack, J., Dayton, B., Meek, T., Moore, M. f Magaard, V., Malinowski, N. 
and Debouk, C. (1089) Protein; Structure Function and Genetics 6. 139-154 

29. Tomarseli, A.G., Olsen, M.K.; Hui, J.O., Staples, D.J., Sawyer, T.K., Heinrikson, R.L. and 
Tomich, C-S.C (1990) Biochemistry 29, 264-269. 

30. Meek, T.D., Dayton, B.D., Metcalf, B.W., Dreyer, G.B., Strickler, J.E., Gorniack, J.G., 
Rosenberg, M., Moore, M.L., Magaard, V.W. and Debouk, C (1989) Proc. Nail. Acad. Sci. 
USA 86.1841-1845. 

31. Moore, MX., Bryan, W.M., Fakhouri, S.A., Magaard, V.W., Huffman, W.F., Dayton, B.D., 
Meek, T.D., Hyland. L., Dreyer, G.B., Metcalf, B.W., Strickler, N.E., Gorniak, J.G. and 
Debouk, C. (1989) Biochem. Biophys. Res. Communs. 156, 297-303. 

32. Richards, A.D., Phylip, L.H., Farmarie, W.G., Scarborough, P.E., Alvarez, A., Dunn, B.M., 
Hirel. P.H., Konvalinka, J„ Strop, P., Pavlickova, L.. Kostka, V. and Kay. J (1990) J. Biol. 
Chem. 265, 7733-7736. 

33. Laemmli, U.K. (1970) Nature 227, 680-685. 

34. Towbin, H., Staehlin, T. and Gordon, J. (1979) Proc. Natl. Acad. Sci. USA 76, 4350-4354. 

35. Morrisey. J.H. (1981) Anal. Biochem. 117, 307-310. 



794 



Notice: This matt — may be protected by copyright law (Title 17 U.S. Coot,/ 
p^nd of General Virology (1992), 73, 639-65K Printed in Great Britain 639 

Autoprocessing of the human immunodeficiency virus type 1 protease 
precursor expressed in Escherichia coli from a synthetic gene 

Viviane Valverde, 1 Pierre Lemay, 1 Jean-Michel Masson, 1 Bernard Gay 2 and Pierre Boulanger 2 * 

\Qentre de Trarisfert en Biotechnologies Microbiologic, CNRS-UA544, 1NSA, Complexe Scientifique de Rangueil, 31077 
Toulouse and 2 Laboratoire de Virologie et Pathogenese Moleculaires, Faculte de Medecine, 34060 Montpellier, France 



H gene encoding an N-terminally extended precursor 
of 107 residues of the human immunodeficiency virus 
type I protease (PR 107) was chemically synthesized 
and cloned into abacterial expression vector, under the 
control of the araB promoter. PR 107 was expressed 
alone or fused in phase to the amino or carboxy 
terminus of the bacterial /?-galactosidase (0-gal). The 
yield of protease and /7-gaI was found to be significantly 
higher when the gene for PR 107 was cloned upstream 
of the Escherichia coli /acZ gene (PR107~/?-gaI). Com- 
parisons of the level of cloned protein expression be- 
tween protease precursor and mature form suggested 
that this enhanced expression was due to the additional 
5'sequence of the PR 1 07 gene, and occurred at the post- 



transcriptional level. Autoprocessing of protease pre- 
cursor and its release from the /?-gal fusion protein were 
analysed using wild-type and mutated cleavage sites. 
Mutations were introduced at amino acids downstream 
of the F-P scissile bond, at positions P4' and P5' in the 
C-terminal site (TLNF*P1SP), and at position P3' in a 
consensus N-terminal site (TLNF*PQITL) placed at 
the protease-0-gal junction. The data obtained sug- 
gested that (i) autoprocessing at the carboxy-terminal 
F-P bond was not significantly influenced by the pre- 
sence of the N-terminal precursor sequence, (ii) P4' 
and P5' substitutions in the C-terminal site had no 
effect on cleavage, and (iii) P3' in the N-terminal site 
tolerated a wide variety of substitutions. 



Introduction 

Specific processing of the human immunodeficiency 
virus type I (HIV-1) gag-pol polyprotein by the virus- 
encoded protease yields the structural gag gene products 
P24CA, pl7MA and plSNC, as well as non-structural 
proteins, reverse transcriptase, endonuclease and pro- 
tease (reviewed in Cann & Karn, 1989; Wills & Craven, 
1991). The HIV-1 protease originates from a large gag- 
prt-pol polyprotein precursor, presumably as a result of 
mtermolecular autoprocessing events (Lillehoj et aL, 
*988; Mous et a/., 1988). It belongs to the aspartyl 
Protease family and its active site contains the consensus 
DT-G sequence (reviewed in Kraiisslich & Wimmer, 
1988). It is active as a dimer (Katoh et at., 1 989; Miller et 
19896) and is inhibited in vitro by pepstatin 
(Seelmeier etaL, 1988; Katoh et aL, 1987). Since specific 
Proteolytic cleavages are essential for assembly of 
"Jfectious HIV virions (Kohl et a!., 1988; Peng et aL, 
J989; Gelderblom, 1991), the protease represents one of 
!j e Possible targets for enzyme-directed anti-AIDS 
therapy (reviewed in Skalka, 1989; Tomasselli et aL, 
(h k P rotea se has been produced in bacteria 

^eboucjc et aL, 1987; Graves et aL, 1988), chemically 
ynt hesized (Schneider & Kent, 1988), crystallized 



(Miller et aL. 1989a, 1990; Navia et aL, 1989) and co- 
crystallized with peptide-based specific inhibitors 
(Erickson et aL, 1990; Miller et aL, 19896; Wlodawer et 
aL, 1989). 

The aim of the present study was to analyse the 
mechanism of autocatalytic processing of HIV-1 pro- 
tease at its N and C termini, and its subsequent release 
from a fusion protein. For this purpose, as an alternative 
to in vitro chemical or in vivo biological synthesis, the HIV 
protease was expressed from a synthetic gene cloned into 
a bacterial expression vector. We cloned the prt gene in 
Escherichia coli, unfused or fused in phase to the 5' or 3' 
end of the bacterial lacZ gene (upstream or downstream 
position, respectively). Our approach has several advan- 
tages compared to previously described production 
systems, (i) A synthetic gene sequence has more 
versatility for further genetic manipulations; (ii) the 
protease gene sequence was designed for optimal codon 
usage in E. coli; (iii) since the cloned protease was 
expected to be toxic for the recipient cell (Hostomsky et 
aL, 1989), the protease gene was cloned under the strong 
but tightly regulated araB promoter (Cagnon et aL, 
1991); (iv) the yield of the /*-galactosidase (/?-gaIH us ed 
gene products could be monitored easily using a simple ft- 
gal enzymic assay; (v) purification of HIV-1 protease 



° 00 ^5|9<D 1992 SGM 



640 V. Valverde and others 



might be achieved by affinity chromatography of 
protease-/?-gal fusion on immobilized /J-gal substrate or 
anti-j3-gal antibody. Bacterial /?-gal has also been 
recently used to monitor the HIV protease activity upon 
insertion of one of its specific sites into the lacZ gene 
(Baum et a/., 1990). 

The HIV-l protease sequence is flanked by two 
processing sites, SFNF*PQIT at its N terminus and 
TLNF*PISPI at its C terminus. Cleavage of the protease 
precursor and release from the protease-/?-gal fusion 
protein was analysed by mutagenesis of amino acids 
downstream of the F-P scissile bond, at positions P4' and 
P5' in the C-terminal processing site and at position P3' 
in a consensus N-terminal site placed at the protease-/*- 
gal junction. The tolerance to amino acid residue 
substitutions was thus evaluated from the rate of 
protease-/?-gal cleavage. We found that mutations at 
subsites P4' and P5' had no apparent effect on cleavage, 
but that the P3' position in the N-terminal site showed 
some sensitivity to amino acid substitutions with strong 
effects on secondary structure. 



10 as PRamMTZ. Amber suppressions at position P3' 
sequence gave rise to P3' mutants. A schematic drawing 
gene constructs is presented in Fig. I (o). 



Methods 

Bacterial strains and plasmids. E. coli strain TG-I (Amersham) was 
used for cloning, and MCI 06 1 (Casadaban & Cohen, 1980) for gene 
expression. Cells were grown in LB medium supplemented with 
ampicillin (100 ug/ml; LBamp). The unfused protease gene was 
constructed from oligonucleotide blocks directly assembled into 
pCrislO, a derivative of pKK233-2 (Amman & Brosius, 1985) which 
differs from pKK 233-2 by its synthetic cloning cassette (V. Valverde et 
o/. t unpublished results). /f-gal-proteasc fusion constructs were direcUy 
assembled into pARA14. Plasmid pARA14 contains an arabinose- 
inducible promoter, termed araB (Cagnon et c/., 1991), and constituted 
our expression vector. For induction of gene expression, overnight 
cultures were diluted 20-fold in LBamp, and incubated at 37 °C until 
they reached an optical density at 600 nm (OD 6D0 ) or 0*5. Arabinose 
was then added to the cultures to a final concentration of 0-2%. 

Nomenclature. PR 100 and PR 10? refer to the mature protease and 
protease precursor of 100 and 107 amino acid residues, respectively, 
including the initiator codon, methionine. When the prt gene was fused 
to the 5' end of the lacZ gene (upstream position), the resulting in-phasc 
fusion protein was called PR107-/f-gal or PRI00-/?-gal. When prt was 
fused to the 3' end of lacZ (downstream position) the name was ^-gal- 
PR 107. No 0-gaI-PRIOO fusion protein was constructed since this 
protein would have lacked the protease N-terminal processing site at 
the /J-gal-protease junction. The protease mutant named G33 had a 
glycine replacing the aspartyl residue 33 in the PR 107 precursor, which 
corresponded to D (25) in the active site of the mature protease. The 
corresponding upstream fusion product was termed G33-/?-gal. RR-/J- 
gal designates a 0-gal- fused protease variant in which the basic 
dipeptide Arg-Arg at positions P4' and P5' from the scissile bond at the 
PR107-/*-gal junction was substituted for the neutral Pro-Gly 
dipeptide. When a sequence homologous to the protease N-terminal 
processing site, but containing a suppressive amber stop codon in lieu 
of the isoleucine codon at position P3\ was introduced at the 
downstream PR 1 07-0-gal junction, the resulting plasmid was referred 



Gene constructions. Oligonucleotides were synlhesiied on an Applj C( j 
BioSystems 380A DNA synthesizer (Applied BioSystems) and purified 
as Irityl derivatives on Nensorb Prep columns (NEN Research 
products) according to the manufacturers specifications. They W cre 
enzymically phosphorylated as previously described (Normanly et a] 
1986). Two different HIV- 1 prt genes were constructed. The first one' 
encoded the mature form of protease of 100 amino acid residue* 
starting its sequence with the Met (l)-Pro (2) dipeptide whereas the 
second encoded a precursor form of 107 amino acids. The PR 107 gene 
contained an additional 5' nucleotide sequence encoding the heptapep- 
tide (M)GTVSFNF (not counting the initiator methionine) present 
within the gag-prt-pol polyprotein, at the ^-protease junction, thm 
reconstituting the original Phe-Pro dipeptide of the N-tenninaJ 
protease processing site (Darke et at, 1988, 1989). 

Both genes for the unfused protease constructs PR 100 and PRIQ7 
were assembled stepwise from 20 complementary synthetic oligonu- 
cleotides annealed and ligated into three adapting building blocks (Fig. 
1 b). Oligonucleotides corresponding to the protease coding strand Wert 
given odd numbers, and the even numbers corresponded to the non- 
coding strand (Fig. Ic). Step I, oligonucleotides CI to C8 were 
annealed and cloned between the Hind]}] and BglU restriction sites of 
pCrislO. Step 2, the central part of the gene (oligonucleotides Ml to 
M6) was inserted between the Bglil and Nar] sites. Step 3, the 5' end of 
the PRIOO-encoding gene was reconstituted from oligonucleotide pairs 
N1-N2, N5-N6 and N7-N8, cloned between Narl and Nco\\ for the 5' 
end of PR 107, N3-N4 was inserted in lieu of N I-N2 (Fig. I c, and inset 
e). For optimal expression, the sequence was designed according to the 
codon usage of E. coli (Fig. I c). All gene constructs were verified by 
DNA sequencing (Sanger et al, 1977). PR100 and PR 107 genes were 
then inserted into the pARAl 4 expression vector, under the control of 
the arabinose-inducible promoter araB (Cagnon et at., 1991). 

J n -phase 0-gal fusions were directly constructed into the pAR'AH 
expression vector already containing the gene for PR 1 00 or PR 1 07. The 
downstream fusion ^-gal-PR \ 07 was obtained by insertion of thcplZ4 
Ncol 0-ga| gene cassette (V. Valverde et al. y unpublished) into the Ncol 
site or PR107. The two upstream fusions PRl00-^-gal and PR1O7-0- 
gal were generated by li gating the 5' end of the lacZ gene to the 3' end of 
the PRIO0 or PRI07 gene. This was achieved by first cloning the LI 
linker sequence (Fig. 1 d) at the 3' end of the prt gene. The plZ2 cassette 
(V. Valverde el a/., unpublished) was then inserted between the Sail 
and //mdlll sites, within the LI and A RAH sequences, respectively. 
The LI linker was then mutated into the L0 linker (Fig. 1 d), to obtain 
the PRlOO-0-gaJ and PR107-/f-gaJ fusions. 

Site-directed mutagenesis at the catalytic site ami protease-fad 
Junction. The phosphorothioate method (Taylor et al., 1 985) was used to 
introduce two substitutions into the active site of the protease. G and T 
substituting for A and C at nucleotides 100 and 101 of the PR 107 gent, 
respectively (Fig. I d). This created a new Kpnl site and changed the 
D (33) residue of the D-T-G triad (position 25 in the mature enzyme) 
into a glycine. This gave rise to the unfused G33 and /f-gal-fused G33- 
/f-gal mutant constructs (Fig. \a). 

The protease-^-gal junction in PRI00-^-gal and PR 1 07-/7-8^ 
fusions consisted of the sequence TLNF*P1SP, representing th c ^* 
terminal processing site of the protease. By our cloning strategy 
above), the RR-/J-gal fusion was obtained first, since it contained lb c 
LI linker encoding the junction sequence LNF*PISRRGL Using l bc 
same method (Taylor et al. t 1985), the LI linker was then mutated inl° 
the LO sequence (Fig. ! d), encoding the sequence LNF 

♦PISPCG! 

This latter sequence better mimicked the natural C-terminal processing 
site LNF'PJSPIGL in terms of side chain electric charges and over*" 
conformation. 
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Fig. I. (a) Schematic representation and nomenclature of the various protease gene {prt) constructs with the N- and C-terminal 
autoprocessing sites. The symbols and abbreviations used are: 0-gal, 0-galactosidase; araB, a rabi nose- inducible promoter; O, F-P 
scissile bond; J, amber mutation; G*, D to G substitution in the protease active site; X, amino acid specified by amber suppressor 
tRN A at the P3 r position in the downstream processing site consensus to the N-tcrminal site, (b) Diagram of the stepwise construction 
of HIV* I mature protease gene (PRIOO). Three successive building blocks with appropriate sticky ends were inserted into the pCrislO 
plasmid (Cagnon et al. t 1991). (c) Nucleotide sequence of the synthetic gene for the protease precursor of 107 residues (PR 107), coding 
for the protease of 99 amino acids with an initiation methionine residue (PRIOO), and an additional N-terminal heptapeptide 
(GTVSFNF*) encoded by a portion of the pol sequence upstream of ihc prt gene, (rf) Protease-0-gal junction sequences. LO encodes the 
sequence LNF*PISPGGI, consensus to the C-terminai processing site; Ll for the mutant sequence LNF'PJSRRGl, in which the 
proline-isolcucine di peptide at the P4' and P5' subsites was replaced by Arg-Arg. In the inset (e)arc shown the two complementary 
oligonucleotides used to construct the gene for PRIOO protease, Nl and N2 replacing N3 and N4 at the 5\ end. 



Amber mutation suppressor tRN A scanning. A double-stranded 
oligonucleotide sequence (coding strand: 5' TGCACTCT- 
GAACTTCCCTCAGTAGACTGGGGATCCA Y) encoding the 
consensus N-terminal processing site TLNF'PQITL was synthesized, 
w 'th an amber codon replacing the isoleucine codon at position P3' of 
itescissile bond. The TLNF*PQom£TL-encoding nucleotide sequence 
was then inserted at the PR 107-/?-ga! junction, between the ApaL\ site 
at the 3' end of the precursor protease gene and the Hmdlll site in the 
PARA plasmid. The PR 107 gene with the stop codon-containing 
Junction sequence was then fused to the 5' end of the hcZ gene, giving 
rise to the PRa/nMXZ plasmid. A set of 12 amber tRNA suppressor 



genes cloned in pct2 vector (Kleina et al. f 1990a, b\ Masson & Miller, 
1986; Normanly et al. w 1986, 1990) were tested for co-expression with 
PRamWLZ in the MC 1061 strain. The 0-gal assay was used to estimate 
the efficiency of the amber suppression, as compared to the control 
PRl07-/*-gal fusion which contained a non-mutated C-terminal 
cleavage site as its junction sequence. Extracts from cells expressing 
PRamWLZ only were assayed for the 0-gal assay and showed that 
there was no detectable reinitiation of protein synthesis on the lacZ 
mRNA downstream of the amber codon. The extent of protease 
autoprocessing in the presence of suppressor tRN As was estimated by 
the protein ratio in the I27K and 1 16K bands (Fig. 3), representing the 
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protea$e-0-gal fusion protein and 0-gal released from the protease, 
respectively. The 127K/116K doublets were scanned. on anti-/J-gal 
immunoblot patterns of SDS-polyacrylamide gels. 

Enzyme assays. 0-Gal activity was assayed in extracts from toluene- 
permeabiiized cells by hydrolysis of o-nitrophenyi-/*-r>gaIactosidc 
(ONPG) (Miller, 1972). For protease assay, I ml aliquots of arabinosc- 
induced cell culture taken al different time intervals were pelleted , 
washed in TN buffer (20 mM-Tris-HCI pH 7-5, 0*1 M-NaCi) and lysed 
in 50 ul of lysis buffer at 4 °C overnight with gentle mixing. Lysis buffer 
was made of 50 mM-Tris-HC! pH 7-5, I rnM-DTT, 0 01 % lysozyme, 
01% NP40, I mg/ml DNase I and a cocktail of protease inhibitors 
(Darke et a!. % 1988). The precursor to p24CA w produced in a 
baculovirus expression system (Royer et al., 1991) was used as the 
substrate for protease. Aliquots (10 ul) of gag polyprotein (0*5 to I -0 pg) 
in 0-2 M-sodium phosphate buffer pH 6-5, I M-NaCl, 5% glycerol and 
0-25% NP40 were incubated with 10 pi of bacterial cell lysatc at 27 C C 
and the reaction was stopped at different time intervals by addition of 
20 pi of SDS-PAGE sample buffer. The gag precursor cleavage 
products were analyzed by SDS-PAGE and immunoblotting with gag- 
specific antibody. 

Biochemical and immunological analyses. Bacterial proteins were 
analysed by SDS-PAGE (15%) in a discontinuous buffer system 
(Laemmli, 1970). In a typical experiment, 1 ml bacteria! cell culture 
was pelleted, washed with TN buffer, resuspended in 100 pi of TN 
buffer, mixed with 100 pi of SDS-urea-2-mercaptoethanol sample 
buffer and denatured at 100 °C for 2 min. Gels were stained or 
electrically transferred to nitrocellulose membrane (BA85, Schleicher 
& Schiill) for 45 min at 180 mA in a semi-dry apparatus (Millipore 
SDE). Membranes were reacted with anti/J-gal (Research Plus) or anti- 
HIV-I protease rabbit polyclonal antibody (a gift from Dr E. Cheng, 
DuPont) and a phosphatase-labclled ami-rabbit IgG conjugate 
(Sigma). Gag polyprotein products were detected on blots by reaction 
with an anti~pr55-p24CA tKW mouse monoclonal antibody (Epiclone- 
5001; Epitope Inc.) and a phosphatase-labelled anti-mouse IgG 
conjugate (Sigma). ELISA was performed using commercially avail- 
able bacterial 0-gal (Sigma) and anti-^-gal antibody (Research Plus). 

For protein microsequencing, proteins were paniaJly purified by 
acetone fractionation from bacterial cell lysates (Hansen et al, 1988). 
The acetone precipitate was then chromatographed on FPLC-MonoS 
column in 50mM-sodium 2-(N-morphoIino)ethanesulphonic acid 
buffer P H 6-5, I mM-<Jisodium EDTA, 10% glycerol (Pharmacia). 
Protease was eluted at 0*2 M-NaCI and its purification was achieved by 
preparative SDS-PAGE in a Tricine-buffered system (Schagger & von 
Jagow, 1987). For 0-gaI and protease, the 1 16K and I IK bands were 
transferred to Irrnnobilon membranes (Ploug et al„ 1989) and amino- 
termina) sequences were determined using an Applied BioSystems 
470A protein sequencer coupled to an Applied BioSystems I20A PTH 
Analyser. 

Electron and immunoelectron microscopy. £. coli cells fixed in 2-5% 
glutaraldehyde in 01 M-phosphate buffer pH 7-5, for 1 h at 4 °C, were 
post-fixed with 2% osmium tetroxide in water and embedded in Epon 
(Epox-812; Fullam). Immunogold staining (IGS) was carried out by 
successive reactions of cell sections with primary anii-^gal or anti- 
protcase rabbit antibody overnight at 4 *C (at a dilution of 10 ug/ml in 
Tris-buffered saline), and 5 nm colloidal gold-labelled anti-rabbit IgG 
antibody for I h at room temperature (single IGS reaction). For double 
IGS, specimens were first simultaneously incubated with mouse 
monoclonal anti-/?-gal antibody (Sigma) and rabbit polyclonal anti- 
protease antibody, then with I nm gold-labelled anti-rabbit IgG 
antibody and 5 nm gold-labelled anti-mouse IgG antibody (EM-GAM 1 
and EM-G AM5 ; BioCell Research Lab). Specimens were post-stained 
with 0-5% uranyl acetate and examined under the Philips EM-300 
electron microscope. 



Results 

Expression of P-gal-fused and unfused HrV-1 protease 
and protease precursor in E. coli. 

The level of expression of protease and /J-gal synthesi* 
in bacterial' cells under the control of the araB prom t 
was estimated by SDS-PAGE and immunoblottin R \Z 
quantified by the 0-gal assay. In E, coli expressing r_h 
unfused construct PRI07, a band at 1 IK reacting with 
anti-protease antibody appeared after 1 h of arabinosc 
induction (Fig. 2, lane 6) and progressively increased 
with the time of induction (Fig. 2, lanes 7 and 8) A 
similar pattern was obtained with the G33 mutant (Fig 
2, lanes 9 to 12), whereas no specific I IK band was 
detected in the PR 100 pattern (Fig. 2, lanes 1 to 4). With 
the fused construct PRlO7-0-gal, the anti-protease 
serum revealed a discrete band of 1 27K fusion protein as 
early as after 0-5 to 1 h of induction (Fig. 3a, lanes 5 and 
6) and an intense band at 11K became visible after 2 h 
(Fig. 3 a, lanes 7 and 8). Several other discrete bands or 
intermediate cleavage products, ranging from 80K to 
20K in apparent M T were also seen on the blot (Fig. 3g, 
lanes 5 to 8). With the ^-gal-fused mutant G33-^gal, the 
127K fusion band increased in intensity during the 
induction period, without detectable release of the UK 
protease (Fig. 3 a, lanes 13 to 16). In contrast to PR107, 
PR107-/?-gal, G33 and G33~/?-gal, a low level or 
expression of protease and fused ^-gal-protease was 
obtained with PR1O0 and PRIOO-jS-gal (Fig. 2, lanes 1 to 
4; Fig. 4a, lanes 9 to 12), as well as with the downstream 
construct /?-gal-PR107 (Fig. 3a, lanes 9 to 12). Immuno- 
blotting with anti-jJ-gal antibody (Fig. 3b) confirmed 
the results obtained with the anti-protease serum (Fig. 

3 a). /?-Gal was expressed at much higher levels with 
PR107-/?-gal, G33-j?-gal and RR-/?-ga! than with the 
other fusions 0-gal-PR 1 07 apd PR1 00-/?-gal (Fig. 3^ and 

4 b). 

The results of immunoblots were confirmed by /J-g a * 
assays performed on cell extracts: the level of 
expression with upstream fusions PR107-/?-gal, G33-0* 
gal and RR-^-gal was ninefold higher than with the 
downstream fusion /?-gal-PR107, and 15-fold higher than 
with PRlOO-0-gal (Table 1). The common feature 
between all the highly expressed genes was the presence 
of the additional 5' sequence encoding the N-terminal 
precursor protease heptapeptide. To determine whether 
this enhanced expression of the cloned genes occurred a* 
the transcriptional or post-transcriptional level, * nC 
overall quantity of protease mRNA expressed upon 
arabinose induction by the different fusion constructs 
was estimated by slot-blot hybridization, using 3 ' p " 
kinase-labelled oligonucleotide M6 as the ssDNA P ro j* 
(Fig. 1 c). No significant difference was found in * nc 



Autoprocessing of HIV- J protease 643 



PR100 PRI07 G33 + Ara 

M | 0-5 1 2 4 | 0 5 1 2 4 I 0-5 I 2 4 I (h) 

7': ~r . ' 

— - 

18K— 
UK— 

4 

I- ... " ' 

Lane I 2 3 4 5 6 7 8 9 10 II 12 
Fig. 2. SDS-PAGE and immunoblotting analysis of unfused HIV-1 protease expressed in £. coli under the control of araB promoter. 
Cells expressing PR 100 (lanes 1 to 4), PR 1 07 (5 to 8) and G 33 mutant (9 to 12) were harvested after 0*5, 1, 2 and 4 h of arabinose 
induction, respectively. Lane M, prestained marker lane (18K and 14K markers are shown). Blot was reacted with anti-HlV-1 protease 
rabbit serum and phosphatase-labellcd secondary antibody. 



Table 1 . Enzymic assay of fi-gal-fused protease yields by E. 
coli cells harbouring various expression vectors* 



i 


/f-Gal activity 






Vector 


(U/ml) 


Ratio It 


Ratio2| 


pARAl-tocZ 


610 


1-0 




faal-PRl07 


595 


0-9 




PRlOO-0-gal 


313 


0-5 


1*0 


i PRl07-/J-gat 


4850 


7-9 


15-5 


j G33-0-gal 


5110 


8-4 


16*3 


RR-fgal 


4444 


7*3 


14-2 



* Figures in the table are values of /J-gal activity, determined by 
; hydrolysis of ONPG and expressed in units per ml (U/ml) of 
j pcnneabilized cell extracts (Miller, 1972). Each value represents the 
J rwage of five individual results obtained in triplicate experiments 
' &E.M., 5 to 10% of the activity values). The /?-gal-specific activity was 1 
* 10 s U/mg, as estimated from stained gel scannings and EL1SA. The 
Tituc of 4444 to 4850 U/ml thus corresponded to 44 to 48 mg/K 
f Ratio of pro tease -fused /f-gal to pARAl-/acZ value. 
| Ratio of protease-fused/J-gal to PR100-/J-gal value. 

imount of protease mRNA expressed by the various j 
\ constructs (data not shown), suggesting that the enhanc- 
ing effect occurred at the post-transcriptional level. 
* The yield of HIV-1 protease obtained with the vector 
j ^pressing the fusion construct PR107-/*-gal was esti- 
» mated by three indirect methods, all of them based on 

determination: (i) 0-gaI enzymic assays (Miller, 
! W2), from calculated specific activity of 10 5 units per 
: -W 0-gaI protein; (ii) ELISA, using commercial /?-gaI 

*jthe standard and anti-^-gal serum; (iii) Coomassie 
= *estainingof the 127K/1 16K doublet band of proteasc- 
' feed and released 0-gal in SDS-PAGE, after calibrating 

**gel with purified 0-gal. The protease (1 IK) represents 

a tout one-tenth of the mass of the fusion protein, in 
! kmis of polypeptide mass ratio (1 16K for 0-gal). Thirty 

J 050 nig of PR107-/J-gal fusion protein was obtained per 
. *k of cell culture after 4 h of arabinose induction. 
t ' burning a 100% efficiency for protease self-processing 
; 3fl <- release from the fusion protein, its theoretical 



recovery would be 3 to 5 mg of protease per litre, i.e. 1 *2 
to 2*0 mg per g of E. coli wet weight. A similar protease 
recovery (1 mg/1) was obtained from E. coli expressing a 
maltose-binding protein fusion construct (Louis et al. t 
1991). 

Activity, cytotoxicity and cellular distribution of fi-gaf- 
fused and unfused protease and protease precursor in 
E. coli 

The results of the immunoblot analysis, showing evi- 
dence of HIV- 1 protease release from the wild- type /?-gal 
fusion constructs and accumulation of non-cleaved 
fusion protein with the G33-#-gal mutant (Fig. 3 and 4), 
suggested that the cloned protease was enzymically 
active and capable of self-processing in vivo. It could 
therefore be anticipated that protease synthesized in E. 
coli upon arabinose induction would also be active on its 
natural gag substrate. To verify its specificity of cleavage 
in vitro, our protease was incubated with baculovirus- 
expressed 41K gag polyprotein as a substrate (pr4l^). 
Pr4l^ comprised the pl7MA and p25CA domains 
(amino acids 1 to 375 of the gag sequence), with only two 
unique protease sites (Mervis et al., 1 988), Tyr-Pro at the 
pi 7-p24 junction (amino acids 132 and 133), and Leu- 
Ala at the p24-p25 junction (positions 363 and 364), the 
Met-Met site at 377 and 378 being eliminated by carboxy 
truncation (Royer et a/., 1991). 

As expected, the active site mutants G33 and G33-/?- 
gal showed no detectable proteolysis of pr41*v (not 
shown), whereas all the other protease clones were found 
to cleave the gag polyprotein substrate at its two specific 
sites, yielding the characteristic anti-gag antibody- 
reacting p24-p25 doublet (Fig. 5). The highest activity 
was obtained with PR107 and PRlO7-0-gal, which 
converted almost all the gag precursor into p24-p25CA^ 
after 30rnin of incubation (Fig. 5, lanes 3 and 7). The 
lower proteolytic activity shown by the extracts of cells 
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(a) (-) (+) (+) ( + ) Ara 




Lane 1 2 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 

Fig. 3. SDS-PAGEand immunoblolting analysis of ^gal-fused protease clones PR107-/?-gal (lanes 1 to4and 5 to 8), 0-gal-PRlO7(9to 
12) and G33-/?-ga! mutant (13 to 16), expressed in£. coli in the presence ( + ) or the absence ( — )of arabinosc inducer. The same blot was 
successively reacted with (a) anti-proiease antibody and phosphatase-labelled conjugate, and (fr) anti-/f-gal antibody and iis 
corresponding phospha Use-labelled conjugate. Cells were harvested after 0*5, 1, 2 and 4 h of induction. M, presumed M t markers 
(BRL, high M, range). Protease-/?-gal fusion protein migrates as a I27K species, free /J-gal and free protease as 1 16K and UK protein 
bands, respectively. 



Autoprocessing of HIV- J protease 



(a) 



pARA-facZ 
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Fig. 4. SDS-PAGE and immunoblotting analysis of control pARA-/ocZ (lanes 1 to 8) and p-gal-fuscd protease clones PRlOO-0-gal 
(lanes 9 to 12) and RR-/£-gal mutant (13 to 16), expressed in £. colt. As in Fig. 3, the blot was successively reacted with (a) anti-protease 
antibody and phosphatase-labelled conjugate, and {b) anti-/J-gal antibody and its corresponding phosphatase-labelled conjugate. Cells 
were maintained for 0-5. 1, 2 and 4 h in the presence ( + ) or absence (-) of arabinose. M, prestained M r markers; 0-gal, bacterial 0- 
galactosidase(U6K) from a commercial source. Note that the central portion of the blot, of no informative value, is not presented in this 
figure. 



PR 107 



PR107-/J-ga| 



PRlOO-/!-gaI 



0-gal-PR 107 



M 0 1 2 3 4 5 6 7 8 9 10 II 12 13 14 15 16 



66K 



45K 



41K- 



29K 



25K. 
24K 



18K 

Pig. 5. Activity of bacterially expressed HIV-1 protease on its natural substrate in vitro. Extracts from E. colt cells harvested after 4 h of 
arabinose induction were incubated at 27 °C with baculovirus-expressed gag polyprotein pr41. precursor to p24-p25CA»*». Samples 
were withdrawn at 5, 15, 30 and 60 min, and analysed in SDS-PAGE and immunoblotting with mouse monoclonal anti-gag precursor 
antibody (Epiclone 5001 , Epitope) and phosphatase-labelled conjugate. Lane M, prestained Af r markers; lane 0, control gag polyprotein 
incubated for 60 rain with extract from non-induced cells harbouring PR 107. 
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Fig. 6. HIV-J protease expression and bacterial cell viability. Growth 
curves of E, coli cells harbouring various forms of ^-gal-fused or unfused 
protease constructs were determined by measurements of OD 600 
throughout the arabinose induction period, {a) Non-induced (open 
symbols) or arabinose- induced (closed symbols) PR107 (circles) and 
PRl07-/?-gal (triangles), (b) Non-induced (open symbols) or arabi nose- 
induced (closed symbols) mutants G33 (circles) and G33-p-ga| 
(triangles). 



expressing PR100-/?-gal and j?-gal-PR107 (Fig. 5, lanes 9 
to 12 and 13 to 16) reflected the lower levels of protease 
production from these two constructs (Fig. 2 to 4). 

However, the difference in fused and unfused protease 
expression observed between some of our clones could be 
due to cytotoxicity of the protease and a negative 
feedback effect on protein synthesis. In addition, c.p.e. 
has been frequently found to be related to the cellular 
distribution of cloned foreign proteins. Arabinose- 
induced cells were therefore analysed with respect to 
their growth rate, and the cellular localization of unfused 
and /J-gal-fused protease was examined by immuno 
electron microscopy. As shown in Fig. 6, no detectable 
cytotoxicity was observed with the fusion constructs, and 
expression of only unfused PR107 and G33 proteases 
resulted in a slight reduction of the growth rate after 4 h 
of induction. The similar patterns shown by the clones 
expressing the active PR 107 protease and the inactive 
G33 mutant suggested that the observed c.p.e. was due to 
the cellular accumulation of a foreign protein rather than 
to proteolytic activity of the protease per se. 

When lysates from cells expressing unfused protease or 
0-gaI alone were fractionated, most of the /h?al and 



(a) 



1 




* 

* 

si 



r t 



Fig. 7. Immunoelectron microscopy of PR107-/f-gal fusion-expressing 
E. coli cells, harvested after 4 h of arabinose induction, (a) Single 1GS 
with anti-/J-gaI rabbit antibody and 5 nm gold particle-labelled anti- 
rabbit IgG secondary antibody, {b) Single IGS with anti-protease 
rabbit antibody and 5 nm gold particle-labelled anti-rabbit IgG 
secondary antibody. The large open arrows point to antibody-reacting 
intracellular inclusion bodies, (c) Enlargement of an inclusion in cell 
doubly stained with primary anti-protease rabbit antibody and anti-0* 
gal mouse antibody, and 5 nm gold-labelled anti-mouse IgG (large opca 
arrows) and 1 nm gold -labelled anti-rabbit IgG as the two secondary 
antibodies. Note that only a few 1 nm gold grains of the many more 
visible on the background are marked by thin arrows. Bar represents 
200 nm in (a) and {b), and 20 nm in (c). 



protease activities were recovered in the soluble supema* 
tant (not shown). No inclusion body was observed undei 
the electron microscope, and IGS showed that the cloned 
proteins (protease or /J-gal) were evenly distributed 
within the E. coli cytoplasm (not shown). In contrast* 
amorphous inclusions were found in cells expressing the 
two fused constructs PRlO7-0-gal and G33-/?-gal. Tbese 
intracellular inclusions reacted with both anti-/?-gal an 
anti-protease antibodies in single IGS reactions (Fig- ^f* 
b): In the PRl07-/?-gal-expressing clone, the gold gra* 
pattern was found to be different with anti-£-gal atl ^ 
anti-protease antibodies: 0-gal molecules were observe 
inside and outside the inclusions, whereas protease 
exclusively localized within the inclusion bodies (c QlT1 
pare Fig. la and b). Double IGS, using 1 and 5 nm 
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* 

jloidal gold particles specific for rabbit (anti-protease) 
m d mouse (anti-j?-gal) antibodies, respectively, con- 
ned that the inclusions contained both 0-gal and 
1f0 tease molecules (Fig. 7c), and suggested that protease 
^mained associated with the inclusion and in the 
vicinity of 0-gal molecules after proteolytic processing. 

coprocessing of HIV- 1 protease: proteolytic cleavage at 
■ ^ protease- P-gal junction 

. For pR107-/?-gaI, as well as PRlOO-0-gal, cleavage at the 
p_p bond situated in the natural C-terminal site 
HNF*P1SP placed at the protease-/£-gal junction was 
required to yield the mature protease (Fig. \a): As 
already shown in Fig. 3 (a), the first band to appear at 0*5 
h of induction in the PR107-/*-gal pattern was a 127K 
fusion protein detected with anti-protease antibody. A 
protease band at 11K occurred in significant amounts 
jler 2 h of induction, concurrently with a doublet at 
I27K and 1 16K (Fig. 3b). The 1 16K species was detected 
only with anti-/?-gal antibody (Fig. 36). No 11K band 
was visible in the G33-/?-gal mutant although large 
quantities of fusion protein accumulated at 127K (Fig. 
la, b, lanes 15 and 16). All these data suggested that the 
protease was released from the ^-gal fusion protein by a 

* mechanism of specific self-processing at the C- terminal 
. site of the protease and not by proteolysis due to bacterial 
( proteases. Although produced in much lower yields, the 

I27K fusion band of PRlG0-/?-gal was cleaved at the 
. same time of induction as that of PR107-/J-gal (Fig. 4), 
with the occurrence of detectable protease activity in the 
ccfllysates (Fig. 5, lanes 1 1 and 12). This result suggested 
that, in the protease-/?-gal fusion construct, cleavage at 
the C-terminal F-P bond of protease was not signifi- 
cantly influenced by the presence of cleaved or uncleaved 
» N-terminal precursor sequence. 

Specificity of cleavage at the N-terminal SFNF*PQITL 
««f C-terminal TLNF'PISP natural sites of the protease 
\ fiecursor 

Compared to mature protease PR100, PR107 precursor 

• stained the additional N-terminal heptapeptide 
i GTVSFNF present in the pol reading frame (Fig. 1 a). To 

ktermine the specificity of cleavage at the N-terminal 
. tn dof the protease precursor PR107 and at both the N 
. C termini of the ^-gal-fused protease precursor 
PRl07-/?-gaI expressed in E. coli t the 1 IK band produced 
jjnhe PR 107 clone and the 1 IK and 116K bands from 
: ^ PRlO7-0-gal fusion construct were isolated and 
.fenced. The first six cycles showed the expected 
- guence P-Q-I-T-L-W for the two 1 1 K proteases and P- 
^ p -G-G for the 1 16K processed /?-gal. This confirmed 
^ Protease could mature from its PR 107 precursor, by 



cleavage at the N-terminal site and the protease-/?-gal 
junction, both cleavages occurring at specific F-P bonds. 

Phenotype of mutants in the N- and C-terminal 
autoprocessing sites 

Since any mutation upstream of the F-P scissile bond at 
the protease-/?-gal junction or downstream of the F-P 
bond in the N-terminal site would alter the protease 
sequence and could therefore affect its enzymic activity, 
we restricted our mutational analysis to residues down- 
stream of the C-terminal automaturation site. The PI' 
and P2' positions have been extensively studied (Margo- 
lin et a/., 1990; Partin et at., 1990; Tritch et al. t 1991), 
whereas the requirements for the P3\ P4' and P5' 
subsites have not yet been clearly defined. The pattern of 
protease release by the RR-/J-gal mutant, as shown in 
Fig. 4(6), was similar to its wild-type PR107-/J-gaI 
equivalent (Fig. 3b) y although a slightly delayed process- 
ing (0*5 to 1 h) could be observed. This suggested that the 
presence of the two positively charged arginine residues 
at subsites P4' and P5' in the C-terminal processing site 
did not significantly impair the F-P scissile bond 
cleavage. 

To analyse the influence of the amino acid residue at 
position P3' in the N-tenninal site on protease autopro- 
cessing, the sequence TLNF*PQITL, consensus to the 
natural sequence SFNF*PQITL found at the N terminus 
of the protease, was introduced at the PR107-/?-gal 
junction. We constructed an appropriate vector, termed 
PRom63'LZ, which expressed a j?-gal-fused protease in 
which the protease sequence was flanked by two 
consensus N-terminal processing sites, i.e. its natural 
upstream SFNF*PQITL site and a downstream site with 
an amber stop codon in place of the isoleucine codon 
(TLNF*PQamZ>TL; Fig. la). Any substitution at the P3' 
subsite, downstream of the F-P bond, would therefore 
not affect the protease sequence. The E. coli strain 
harbouring PRam£3'LZ was then transformed by each of 
12 plasmids expresssing an amber suppressor tRNA, and 
protease autoprocessing at the protease-/?-gal junction 
was assayed by the occurrence of the 116K /?-gal band 
released from the 1 27K protease-^-gal fusion in SDS- 
PAGE and immunoblotting. 

As shown in Table 2, the efficacy of suppression, 
determined from the /?-gal activity, was found to vary 
from 0-2% (lysine) to 41% (glycine) for the different 
suppressor tRNAs, a result which confirmed previous 
studies (Kleina et o/., 19906). Protease autoprocessing 
was therefore analysed at late times of arabinose 
induction (4 h), to compensate for the low level of /J-gal 
expression obtained with certain suppressors. Due to 
amino acid mischarging by certain suppressor tRNAs 
(Normanly et aL, 1990), some amino acid substitutions 
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Table 2. Phenotype of amino acid substitutions at position P3' in the TLNF+PQumbT 
autoprocessing site placed at the pro tease- p~gal junction* 



Suppressor 


/J-gal activity 


Suppression 


Autoprocessing} 


tRNA 


(U/ml). 


efficiency! (%) 


(%> 


PRI07~/J-ga1 alone 


3631 (100%) 


- 


100 


YRambyLZ alone 


<2 


0-0 


— 


PRa/nWLZ co-expressed 








with 








Gty2 (62%G;37% E) 


1489 


41-0 


70-75 


His A (94% H) 


803 


2M 


70-75 


Thr2(8%T;.86%K) 


172 


4-7 


100 


AU2 (97% A) 


159 


4-3 / 


too 


GluA(59%E; 17% Q; 








6%Y;6%R) 
Phe (98% F) 


129 


3*5 


100 


91 


2*5 


100 


Arg(91%K;5%R) . 


46 


1*2 


100 


Pro (85% P) 


38 


10 


60-70 


Val(84%K;5%V) 


27 


0-7 


100 


Cys (90% O 


•23 


0-6 


100 


Ile2 (93% K) 


15 


0-4 


100 


Lys (94% K) 


7 


0-2 


. 100 



• E. coli cells harbouring PRamWLZ were transformed with pct2 plasmid expressing one of the 12 
suppressor tRNA genes listed in the table. The specificity and efficiency of the amino acid insertion by 
each suppressor is indicated in parentheses (Normanly el a/., 1990). 

t The degree of amber suppression was estimated by 0~gal activity, expressed as units per ml (U/ml; 
Miller, 1972). Controls were PRIO7-0-gal (100% control for 0-gal activity and autoprocessing) and 
PRamWLZ alone (negative control for /?-gaI and suppression activity). 

X Protease autoprocessing was estimated by scanning of the 127K/116K doublet bands of protease-fused 
and protease-released /?-ga| in immunoblots of SDS-polyacrylamidc gels. 



were under-represented (Table 2). Only three amino 
acids substituting for isoleucine at position P3' were 
found to have some discrete deleterious effect on the 
protease autoprocessing efficiency. Glycine, histidine 
(94% H inserted) and proline (85% P inserted) showed a 
slight but significant delay in protease processing. Gly2 
tRNA inserted glycine and glutamic acid with similar 
efficiency, but comparison with GluA tRNA, which 
mainly inserted glutamic acid, allowed us to assign the 
observed effect on processing to the glycine substitution. 



Discussion 

In the present study we analysed HIV-1 protease 
autoprocessing using synthetic genes encoding a protease 
precursor of 107 amino acids (PR 107) and the mature 
form of the enzyme (PR 100), unfused and fused to the N 
terminus of /?-gal (PRJO7-0-gal and PR100-J?-gal), or to 
its C terminus (/J-gal-PRl07). It has to be kept in mind 
that the concept of 'autoprocessing' used for retroviral 
proteases has been based on the experimental observa- 
tion that wild-type protease is released from protease- 
containing polyproteins expressed in E. coli, whereas the 
mutant protease domain is not (Debouck et a!., 1987, 
1990; Krausslich & Wimmer, 1988; Loeb et at., 1989; 
Mous et al., 1988; Strickler et at., 1989; and refer to 



Fig. 3). However, such results do not unambiguously 
prove that the PR domain embedded in a fusion 
construct is itself an active protease. It cannot be 
excluded that a low level of cellular proteolytic activity 
might randomly cleave the fusion protein and lead to 
release of a low amount of PR, which in turn acts in trans 
to generate further PR. 

Yields of protease and fi-gzl were found to be 
significantly higher when the gene for PR 107 was fused 
to the 5' end of the E. coli lacZ gene (PR107-/?-gal), and 
comparison of the level of expression of unfused and ft- 
gal-fused mature forms (PR 100 and PR100-/*-ga ! ) 
suggested that the enhancing effect was post-transenp- 
tional and due to the additional 5' sequence of the PR W 
gene. In contrast to a previous report which showed an 
inactive form of ^-gal-fused protease accumulating w 
inclusion bodies in E. coli (Giam & Boros, 1988), all our 
cloned proteases (except G33 and G33-/?-gaI protease 
mutants) were recovered in an active form under mild, 
physiological conditions, and none of them, not even the 
inclusion-forming PR107-/J-gal, required a solubiliza- 
tion step in urea-containing denaturing buffer. Our data 
are therefore reminiscent of previous observation on 
high protease yields obtained with certain clones 0 
protease precursors encoding N-terminal sequences o 
or 56 residues upstream of Pro (I) (Debouck et al.* 1™ £ 
According to our results, an upstream sequence of s e% 



i 

4 

1 

I 

\ ^sidues is sufficient to enhance the protease production 

without altering protease activity. 
Upstream fusion to /J-gal, as in PR107-/J-gal, did not 

r c5ult in any detectable cytotoxicity for E. coli cells (Fig. 

^ and a comparison of PR107- and PR107-J?-gal- 
. ^pressing clones with the inactive mutants G33 and 
. o33-/?-gal suggested that the PR107- or G33-induced 

■ c p.e. resulted from the expression of a foreign gene 
. product in £, coli rather than to the enzymic properties of 

the protease itself. Only the two upstream fusion 
constructs PR107-/*-gal and G33-/*-gal gave rise to 
. intracelluar inclusions. This is in contrast to previous 
itports in which intracellular precipitates were only 
observed with mature protease or with protease precur- 

■ sois containing both upstream and downstream se- 
quences expressed in heat-shock response-deficient E. 

' ^//(Debouck et aL, 1990). Immunoelectron microscopy 
of the PR107-/?-gal inclusions revealed that protease 

\ remained associated with the inclusion bodies, even after 
processing (Fig. 7), a phenomenon which could account 

1 for the absence of toxicity observed with our ^-gal-fused 
protease gene constructs. 

Processing at Phe-Pro bonds occurred in vivo with a 
similar efficiency at the PR107-£-gaJ and PR100-/*-gal 

j junctions, at the N terminus of the PR 107 precursor and 
at the N and C termini of the /?-gal-fused protease 
precursor PR107~/?-gaI. The N-terminal site was also 

/ correctly processed when placed at the protease O 

I terminal end. This suggests that each autoprocessing site 

' contains its own information for cleavage, independent 

. of protein context, and supports the hypothesis that 
protease autoprocessing occurs as a result of intermolecu- 
Ur interactions (Miller et aL, 1990). However, this does 
not imply that the N* and C-terminal processing are 
independent events, since the assay used cannot distin- 
guish independence of these two reactions from a 
situation in which the C-terminal cleavage is dependent 
on the N-terminal cleavage, but the latter one occurs at a 

| higher rate (KrausslicheM/., 1989; Strickler era/., 1989). 

j Moreover, a recent study using mutated fusion proteins 
expressed in E. coli indicated that altering one of the 
protease cleavage sites influences the cleavage at the 

: oon-mutated site (Louis et aL, 1991). 

. Upon substitution of the P-G dipeptide at positions 
to' and P5' in the C-terminal site by the two positively 

j c Wged amino acids R-R, protease release was only 
^ghtly delayed. This result was not surprising if one 
considers (i) the topography of residues P4' and P5', 

■ Positioned downstream of the F-P scissile bond and 
i outside of the active site cleft, (ii) the tolerance of an 

ar ginine residue at the P4' position of the natural site at 
J *e p9NO*- p 6LI** junction (Debouck et aL, 1990) and 
to) the tolerance of an arginine residue at PI' position in 
^n-viral protein substrates of HIV- 1 . protease, e.g. 
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troponin C and calmodulin (reviewed in Tomasselli et 
aL, 1991). 

The upstream and downstream boundaries of HIV- 1 
protease constitute its two cleavage sites, SFNF*PQIT 
at its N terminus and TLNF*PfSP at its C terminus, 
respectively. In the present study, we have focused on the 
subsite P3' of the N-terminal autoprocessing site of the 
protease for the two following reasons, (i) It has been 
previously shown that cleavage at the F-P bond 
constituting the N terminus of the mature protease takes 
place significantly faster than cleavage at its C-terminal 
F-P bond (Strickler etaL, 1989), and that the natural N- 
terminal autoprocessing site VSFNF*PQITL had the 
highest V m JK m ratio (Krausslich et aL, 1989). (ii) As a 
result of mutational analyses of native substrates of HI V- 
1 protease (Partin et aL, 1990; Tomasselli et aL, 1991; 
Tritch et al. f 1991) and of its autoprocessing sites 
(LeGrice et aL, 1988; Loeb et aL, 1989; Louis et aL, 
1991), and from theoretical considerations on the 
structure of the protease (Hellen et aL, 1989; Swanstrom 
et aL, 1989), the following consensus sequence for an 
autoprocessing site has been proposed; P4 (small and 
hydrophobic), P3 (undefined), P2 (small), PI (aromatic 
or large and hydrophobic), PI' (proline), P2' (small and 
hydrophobic), P3' (variable). 

Amino acid residues at subsites P3 and P3' have been 
postulated to be critical for the precise alignment of a 
peptide substrate within the protease active site cleft 
(Sali et aL, 1989; Miller et aL, f990). In addition, once 
such a ligand is positioned within the cleft, P3 and P3' 
are adjacent to both flaps of the protease dimer (Erickson 
etaL, 1990; Harte etaL, 1990; Lapattoera/., 1989; Miller 
et aL, 1989a, b; Moore et aL, 1989; Navia et aL, 1989; 
Suguna et aL, 1987 ; Weber et aL, 1989; Wlodawer et aL, 

1989) . However, no experimental data are available on 
the influence of the residue at the P3' subsite of the N- 
terrninal site on protease autoprocessing, due to the fact 
that substitutions at this P3' position would change the 
protease N-terminal sequence and therefore possibly 
alter its proteolytic activity. This was the case when 
aspartic acid at P3' in the N-terminal site of the protease 
was substituted for isoleucine: no self^processing was 
observed with the mutant, whereas cleavage could be 
rescued in trans by the wild-type protease (Partin et aL, 

1990) . 

For an indirect analysis of the effect of P3' substitu- 
tions in the upstream site on protease autoprocessing, we 
introduced a consensus N-terminal processing site at the 
C-terminal extremity of the protease (at its junction with 
/*-gal), and substituted a suppressive amber mutant 
codon for the isoleucine codon at position P3' 
(TLNF*PQtf/w£>T). When the amber mutation was 
assayed for rescue by a series of 12 suppressor tRNAs, 
only G, H and P substituting for 1 were found to reduce 
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the processing efficiency (Table 2). This suggested a 
tolerance for a wide variety of amino acid residues at P3' 
of the N-terminal processing site of HIV-1 protease, 
except for amino acid substitutions with strong effects on 
the polypeptide chain secondary structure, e.g. glycine or 
proline. This implied a relatively high degree of 
flexibility of the protease flaps in their contact with the 
amino acid residue at P3'. The design of more efficient 
HIV protease inhibitors should take into account all 
available data on protease cleavage of viral and non-viral 
natural substrates (Riviere et 1991), but the primary 
and therefore essential event is indeed its own cleavage 
and release by an autoprocessing mechanism. 
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Abstract CD45 is a receptor -like protein tyrosine phosphatase 
critically involved in the regulation of initial effector functions in 
3- and T-cells. The protein comprises two phosphatase (FTP) 
domains in its cytoplasmic region. However, whether each FTP 
domain has enzyme activity by itself or whether both domains are 
required to bufld up a functional enzyme is unclear. We have 
studied different constructions of human CD45 comprising the 
two FTP domains, both separately and as a single protein, fused 
to maltose-binding protein (MBP). In apparent contrast with 
previous studies, we show that the first FTP domain of CD45 
(when fused to MBP) may be a viable phosphatase in the absence 
of the second domain. Phosphatase activity resides in the 
monomeric form of the protein and is lost after proteolytic 
cleavage of the fusion partner, indicating that MBP specifically 
activates the first PTP domain. Furthermore, changes in the 
optima] pH for activity with respect to wild-type CD45 suggest 
that protein-protein interactions involving residues in the 
neighbourhood of the catalytic site mediate enzyme activation. 
© 1997 Federation of European Biochemical Societies. 

Key words; Receptor-like protein tyrosine phosphatase; 
Signal transduction; Maltose- binding protein; Fusion protein 



1. Introduction 

The activation and regulation of reaction cascades in signal- 
ling pathways leading to proliferation and differentiation in 
eujkaryotic cells is related to the balance of tyrosine phospho- 
rylation/dephosphorylation controlled by the action of tyro- 
sine kinases and phosphatases (reviewed in [1,2]). Receptor- 
like protein tyrosine phosphatases (RPTP) exhibit a modular 
structure that includes one or two intracellular PTP domains, 
each homologous to soluble forms of monomeric phospha- 
tases, a single membrane-spanning segment and a variable 
extracellular domain [3J. The PTP domain is organized as a 
central eight-stranded P-sheet flanked on both sides by a-heli- 
ces, as revealed by the crystal structures of the soluble phos- 
phatases PTP1B and YopSl [4,5]. At the center of the active 
site, the highly conserved sequence motif (I/V)HCXA- 
GXXR(S/T) contributes the essential nucleophilic cysteine 
residue and other functional groups required for phos- 
photyrosine binding and catalysis [6,7]. 

CD45, a prototypic receptor-like PTPase, is a 180-220 kDa 
protein highly expressed in hematopoietic cells. CD45 plays a 
critical role in the response of leukocytes to antigen, where it 
is involved in the early stages of the signal transduction path- 
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ways [8] (for review [9-1 1]). The two intracellular PTP do- 
mains of CD45 share about 40% of sequence identities with 
each other [3], and both contain a cysteine residue (Cys 828 in 
PTP-I; Cys 1144 in PTP-II) within the conserved sequence mo- 
tif. However, several mutational studies have indicated that 
Cys 828 (but not Cys 114 * 1 ) is critical for phosphatase activity 
[12,13]. The substitution Cys 828 >Ser completely abrogated 
the enzymatic activity in recombinant forms [12-14] and in 
cells [15], but the replacement Cys 1144 >Ser in the second 
PTP domain resulted in a phosphatase with in vitro and in 
vivo properties similar to those of the wild-type enzyme 
[13,15]. These results suggest that only the first PTP domain, 
but not the second, behaves as an active phosphatase. In 
agreement with these observations, recombinant CD45-do- 
main II alone yielded an inactive protein [13], and several 
RPTPs lack the conserved cysteine within the second PTP 
domain [16,17]. Furthermore, in RPTPs such as LAR and 
HPTPa, the first PTP domain alone displayed a phosphatase 
activity similar to that of the whole protein [18,19]. 

However, the first PTP of CD45 domain was found to be 
inactive when expressed independently [12,13], suggesting that 
in CD45, unlike other RPTPs, domain II is required for the 
activity of domain L This hypothesis is further substantiated 
by the finding that a single point mutation in the second 
domain or the deletion of the region linking the two PTP 
domains totally abrogated the phosphatase activity of CD45 
[20]. Contrasting these data, Tan et al. have shown in eukary- 
otic cells that the second domain of CD45 together with the 
C-tenninal part of the first domain (without the catalytic cys- 
teine) may also be a viable phosphatase [21]. Therefore, it is 
not clear at present whether the separate PTP domains of 
CD45 have phosphatase activity by themselves, or whether 
the enzymatic activity in one or both domains requires specific 
interdomain interactions. 

We have studied different constructions containing the two 
phosphatase domains of CD45, both separately and as a sin- 
gle protein, fused to the maltose-binding protein (MBP). We 
found that the first PTP domain of CD45 when expressed as a 
MBP fusion protein is an active phosphatase in the absence of 
the second domain, and that the enzymatic activity is lost 
after proteolytic cleavage of the fusion partner, suggesting 
that specific interactions with an external factor are required 
to stabilize the active conformation. 

2. Materials and methods 

2.1. Materials 

/?-Nitrophenylphosphate (p-NPP), 2-p-roercaptoethanol (2-ME), di- 
thiotreitol (DTI") and D-maltose were purchased from Sigma. Pefa- 
bloc® from Boeh ringer Mannheim. Factor Xa, antiserum ant i -MBP 
and amylose resin were from New England BioLabs. Bradford re- 
agent from Bio-Rads, and material for SDS-polyacryl amide gel elec- 

AU rights reserved. 
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Fig. I. Different constructions of human CD45. MBP: maJtose-binding protein. Numbering of residues as in Streuli et al. (I987)[22J. 



trophoresis, gel filtration and ion exchange chromatography from 
Pharmacia. 



2.2. MBP-CD45 fusion constructs 

Three fusion constructs of human CD45 (PTP domain -I, FTP do- 
main-H and PTP doraain-I+II) were generated according to the boun- 
daries proposed by Streuli et al. (22) (Fig. 1). Thus, MBP-PTP-I 

includes amino acid residues 575-886 (KsT^rYDL QALVEsss) of 

human CD45, MBP-PTP-II includes residues 940-1202 
(Q940EENKS.- _DV1AS 1202 ), and MBP-PTP-I+II includes residues 
575-1202 (KstsIYDL- _DVIASi2m). The sequences were PCR ampli- 
fied from an expression plasmid containing the entire human CD45 
T200 coding sequence using the primers: 5'-G GAATTCAGATC- 
TAAAATCTATGATCTACAT-3' and 5' GGAATTCAGATCTTT- 
ACTGATTGTATTCCACCAA-3' (PTP-I); 5'-CGGGATCCCAAG- 
AAGAAAATAAAAGT-3', and 5'-CGGGATCCTTAGCTGGCAA- 
TGACGTCATA-3' (PTP-II) and 5 ' - G G AATTCAG ATCTAAAAT- 
CTATGATCTA GAT-3' and 5 '-GGAATTCAGATCTTTA GCTGG- 
CAATGACGTCATA-3' (PTP-I+II). 

Amplified PCR fragments were ligated to pPDXa [23] which had 
been previously linearized with BaniHl, treated with calf intestinal 
phosphatase, and gel purified. pPDXa contains the coding sequence 
for maltose binding protein (MBP) under the control of the maltose 
promoter and is a derivative of pMALcRI (New England BioLabs, 
Beverly, MA). All the plasmids were transformed into competent 
malE~ PD28 cells according to established protocols (24]. 

2.3. Expression and purification of recombinant forms of human CD45 
Cultures of PD28 cells, induction and isolation of Fusion proteins 

were performed at 30°C using protocols previously described [25], 
Protein purification was achieved by affinity chromatography in amy- 
lose resin as described by the furnisher, followed by ion exchange 
chromatography using a MonoQ column (Pharmacia) equilibrated 
in a bufTer containing 50 mM Tris-HCl (pH 8.0), 50 mM NaCl, and 
2 mM DTT. The presenqe of different MBP-CD45 forms was evi- 
denced by anti-MBP Western blot. Cleavage of purified fusion pro- 



teins was carried out by overnight incubation at 4°C with Factor Xa 
(I :I00, w:w). The PTP domains were separated from MBP by anion 
exchange chromatography as described above. Protein purification 
was tested by SDS-PAGE and the respective concentrations measured 
by Bradford colorimetric assay. 

2.4. PTPase assays 

Kinetics of Michaelis-Menten were studied using />-NPP (10 mM 
final) as substrate in 50 mM imidazol (pH 7.0), 150 mM NaCI, 1 mM 
EDTA, and 0.1% 2-ME (total volume of 50 ul) at 37°C. The reaction 
was allowed to proceed for 5-20 min and quenched with 500 uJ of 1 M 
NaOH. The amount of dephosphorylated substrate was calculated 
from the absorbance at 405 nm assuming a molar extinction coeffi- 
cient £405=18000 M _1 cm -1 [5J. Michaelis-Menten constants were 
calculated using the non-linear regression program ENZFITTER [26). 

The effect of pH on the PTPase activity of the recombinant forms 
of CD45 was studied using 10 mM ^-NPP as substrate under similar 
conditions as above. Buffers used for these tests were as follows: pH 
4-5.25, 100 mM acetate; pH 5.5-6\25, 100 mM citrate; pH 6.5-7.25, 
100 mM imidazol; pH 7.5-8.5, 100 mM Tris-HCl. All buffers con- 
tained 1 mM EDTA, 0.1% 2-ME and 50 mM Nad. pH dependence 
data were fitted by non-linear least- squares regression using the pro- 
gram KaleidaGraph 3.0. 



3. Results and discussion 

The cytoplasmic region of CD45 (PTP-I +11) as well as each 
separate PTP domain (PTP-I and PTP-II) (Fig. 1) were ex- 
pressed as MBP fusion proteins in bacteria under control of 
the maltose promoter, and purified to near homogeneity (Fig. 
2). Each construction was tested for PTPase activity using p- 
NPP as a substrate. Recombinant proteins containing the en- 
tire cytoplasmic region of CD45 (MBP-PTP-I+II and PTP- 
l+II) displayed a phosphatase activity (Table 1) comparable 
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1 2 3 1 2 3 1 2 3 




Fig. 2. SDS-PAGE analysis of MBP-CD45 constructions. A: MBP-PTP-I+II; B: MBP-PTP-I; C: MBP-PTP-II. Lane I: after amylose affinity 
chromatography. Lane 2: after factor Xa treatment. Lane 3: after anion exchange chromatography. 
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pig. 3. Progressive loss of PTPase activity of MBP-PTP-I after 
treatment with factor Xa. MBP-PTP-I (□) and MBP-PTP-I+II (♦) 
were incubated with factor Xa at different times and their respective 
PTPase activities determined afterwards. pNPP (10 mM final) was 
used as substrate. Each point represents the mean value of quadru- 
plicated experiences and was calculated as the percentage of ^phos- 
phorylation of pNPP exhibited. 100% of activity was considered as 
the activity displayed in identical conditions of incubation but in 
the absence of factor Xa. 

to similar constructions studied by other authors [6,27,28). 
AJso, as expected, no enzymatic activity was detected in the 
recombinant proteins lacking the first phosphatase domain 
(MBP-PTP-II and PTT-II). However, MBP-PTP-I was found 
to be an active phosphatase, in contrast with previous work 
that reported the lack of catalytic activity of PTP-I in the 
absence of the second domain [12,13]. Both MBP-PTP-I 
and MBP-PTP-I+II displayed K m values in the mM range, 
which are similar to the values reported by other authors for 
CD45 and other PTPases [6,27,28]. As shown in Table 1, V max 
and k^JK m values of MBP-PTP-I and MBP-PTP-II differ in 
about one order of magnitude, indicating that the absence of 



the second domain influences, but does not abrogate, the 
phosphatase activity of the first PTP domain. Similar results 
were obtained when two pY-peptides from hirudin and gastrin 
(Tyrosine Phosphatase Assay Kit, Boehringer) were used as 
substrates (data not shown). 

Functional studies of different CD45 constructions sug- 
gested that specific interactions with the second domain are 
necessary to have a first active domain [13,20,21). In a similar 
way, the phosphatase activity of MBP-PTP-I protein (in the 
absence of the second domain) might be accounted for either 
by specific interactions of PTP-I with MBP or by protein 
dimerization as observed, for example, in receptor- associated 
protein tyrosine kinases [29,30]. In order to test whether the 
covalently linked MBP was involved in PTP-I activation, 
MBP-PTP-I was treated with Factor Xa to separate the fu- 
sion partners. This experiment revealed a progressive loss of 
PTPase activity following the addition of Factor Xa (Fig. 3), 
indicating that the first PTP domain alone was not an active 
phosphatase. On the other hand, cleavage of the MBP moiety 
from the fusion protein containing the entire cytoplasmic re- 
gion did not significantly affect the PTPase activity. 

The possibility that the enzymatic activity of MBP-PTP-I 
activation could arise from protein dimerization was also in- 
vestigated, since previous experimental evidences indicated 
that CD45 could be regulated by dimerization. For example, 
using cross-linking reagents in YAC- 1 cell lysates, Takeda et 
al. detected OD45 homodimers apparently induced by a 
CD45-associated protein [31]. Furthermore, epidermal growth 
factor (EGF)-induced dimerization of an artificial EGF/CD45 
chimera expressed in CD45-deficient T-cell line caused the loss 
of antigen-dependent activation [32]. Using gel filtration chro- 
matography at different MBP-PTP-I concentrations, we ob- 
served monomers, dimers and higher oligomers of MBP-PTP- 
I. However, we have only found PTPase activity in fractions 
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Fig- 4. The phosphatase activity is associated with the monomeric form of MBP-PTP-1 after denaturation/renaturaiion. Aggregated (inactive) 
MBP-pTP-1 was treated with 2.5% SDS and 5 M Urea and renatured by gel filtration (Superdex 200 SMART®). The fractions corresponding 
to the monomer, dinner and higher oligomeric forms were incubated with />-NPP (10 mM final) at 37°C for 12 h. The apparent PTP activity is 
presented by bars. 
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Fig. 5. Ribbon model of the active site of monomelic phosphatases. The nucleophilic cysteine residue is labelled. In our construction, the MBP 
moiety is fused to the NH 2 terminus of the PTP domain (indicated with an arrow) and could therefore interact with neighbouring PTP loops 
(shown in darker colour) containing functionally critical residues (Asp, Gin). The figure was drawn with MOLSCRIPT [36). 0 



corresponding to the monomeric form, even after denatura- 
tion/renaturation of MBP-PTP-I (Fig. 4). These observations 
are consistent with the crystallographic study of RPTPa, 
which demonstrated that homodimerization of the first PTP 
domain inactivated the enzyme by blocking the access of sub- 
strate to the catalytic site [33). 

At this point, our results demonstrated that: (i) the N-tcr- 
minaJ PTP domain of CD45 can be an active phosphatase in 
the absence of the second domain, (ii) the PTP activity is only 
evidenced when PTP-I covaJently bound to MBP (or to PTP- 
II in the wild-type enzyme) and (iii) the monomeric form of 
MBP-PTP-I is responsible for the phosphatase activity. In the 
light of these results, we may speculate that MBP activates the 
first PTP domain of CD45 through specific protein-protein 
Table 1 

Kinetic constants for dephosphorylation of p-NPP by different CD45 constructions 



interactions. This interaction probably involves contact resi- 
dues close to the active site cleft and could compensate, at 
least partially, the putative contacts induced in wild-type 
CD45 by the presence of the second PTP domain. A structural 
model of the MBP-PTP-I protein provides additional support 
to this hypothesis. In the fusion protein, the C-terminal region 
of MBP is connected through a 22 amino acid-long linker to 
the N-terminal segment of PTP-I, a region which folds into an 
a-helix close to the substrate-binding cleft in monomeric 
phosphatases (Fig. 5). In particular, two loops of the PTP 
domains close to the N-terminal a-helix contain important 
functional residues. One of these loops has two glutamine 
residues (corresponding to Gin 872 and Gin 876 in CD45-PTP- 
1) which are highly conserved in the family of protein tyrosine 



(nxM) 



max 



(nmol/min) 



(s" 1 M" 1 ) 



MBP-PTP-I* 

MBP-PTP-1+II 

PTT-I+Il b 



0.8 ±0.2 
1.75 + 0.24 
4.8 



22.98 + 2.1 
348.6 ±1.2 
36 



1233 
12297 
1 1 000 



37 
37 
25 



*VaJues were corrected assuming thai only the raonomeric form of MBP-PTP-I was an active PTPase. 
b VaJues taken from Cho et a!. (1993)[27]. 
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pig. 6. Dephosphorylation of pNPP for the different active forms of 
CD45 as a function of pH. The solid line and (D) represents MBP- 
plp-l, the dashed line and (X) represents PTP-I+H, and the dot 
line and (♦) represents MBP-PTP-I+I1. 

phosphatases and which have been proposed to be involved in 
substrate binding interactions [4,5]. A second loop close to the 
N-terminal cc-helix of monomelic phosphatases contains the 
putative proton donor (Asp 796 in CD45) and undergoes a 
significant conformational change upon the binding of phos- 
pbotyrosine to the active cleft (reviewed in [7]). Given their 
proximity to the N- terminal a-helix (and therefore to MBP), 
these loops might be involved in toterdomain contacts, that 
could account for PTP-I activation. 

Modifications of the local structural environment close to, 
or within, the active site cleft might affect the pKa value of 
functionally critical residues. The study of the pH dependence 
of the enzymatic activity could therefore provide an indirect 
evidence about MBP-contact regions in the neighbourhood of 
the PTP-I active site. As expected, the two constructions con- 
taining the entire cytoplasmic region (MBP-PTP-I+II and 
PTP-I+II) displayed similar bell-shaped profiles with a max- 
imum at pH 5-5.2 (Fig. 6). These values are within the pKa 
ranges observed for other PTTases [6] and confirm our pre- 
vious results (Fig. 3) showing that the presence of MBP did 
not affect the catalytic properties of PTP-I+II. However, the 
absence of the second FTP domain caused a displacement of 
the activity curve towards higher pH values, with an optimal 
activity at 5.8-6.2 for the MBP/PTP-I protein. These differ- 
ences are apparently due to a change in the ascending slope of 
the activity curve and suggest that a region near the active site 
of the PTP-I domain might directly interact with the MBP 
moiety in the fusion protein. In the case of wild-type CD45, 
similar interactions between the two PTP domains could serve 
to regulate the PTPase activity [13,20,21], although other 
mechanisms of regulation, such as phosphorylation and exter- 
nal ligand binding have also been proposed [34,35]. Ultimate 
validation of this hypothesis can only be provided by further 
biochemical and structural studies of CD45 and other mem- 
bers of the RPTP family. 

In conclusion, our results demonstrate that the first PTP 
domain of CD45 may be an active phosphatase in the absence 
of the second domain. However, a functional PTP-I domain 
requires specific protein-protein interactions with an addition- 
al factor (MBP or PTP-fl), a particular feature of CD45 that 
m ay differentiate it from other receptor- like transmembrane 
Phosphatases. 
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Feline leukemia virus contains a protease which apparently has the same specificity as murine leukemia virus 
protease. It cleaves in vitro the PrtiS* 6 * of Gazdar-mouse sarcoma virus into the constituent pl5, pl2, p30, and 
plO proteins. We purified the protease and determined its NH 2 -terminal amino acid sequence (the first 15 
residues). Alignment of this amino acid sequence with the nucleotide sequence (I. Laprevotte, A. Hampe, C. 
H. Sherr, and F. Galibert, J. Virol. 50:884-394, 1984) reveals that the protease is a viral-coded enzyme and 
is located at the 5' end of the pol gene. As previously found for murine leukemia virus (Y. Yoshinaka, I. Katoh, 
T. D. Copeland, and S. Oroszlan, Proc. Natl. Acad. Sci. U.S.A. 82:1618-1622, 1985), feline leukemia virus 
protease is synthesized through in-frame suppression of the gag amber termination codon by insertion of a 
glutamine in the fifth position, and the first four amino acids are derived from the gag gene. 



Feline leukemia virus (FeLV) is a non-genetically trans- 
mitted exogenous retrovirus shown to be associated with 
disease in domestic cats (4). As is characteristic of most 
retroviruses, FeLV genomic RNA contains three genes, 
gag* pol t and e/iv, which are necessary for viral replication. 
The gag gene is translated into the polyprotein precursor 
Pr65 ?a *, which is processed to the structural proteins pl5, 
pl2, p30, and plO; the pol gene encodes an RNA-dependent 
DNA polymerase, and the env gene encodes the envelope 
glycoproteins of the virion surface (2). Based on the order of 
the gag gene-coded structural proteins, protein sequence 
homology, and immunological related ness with murine leu- 
kemia virus (MuLV), FeLV is classified as type C, subgroup 
1 (15). Further, Laprevotte et al. (10) reported a nucleotide 
sequence of 2,565 base pairs which includes a portion of the 
5' long terminal repeat, the gag leader, the complete gag 
gene, and 389 base pairs of the pol gene. Their data indicated 
that FeLV gag and pol genes are translated in different 
reading frames. Recently, we purified and sequenced 
Moloney (Mo)-MuLV protease responsible for the 
proteolytic processing of precursor polyprotein Pr65 fffl * (18). 
The results showed that this protease is encoded by the 
gag-pol gene and synthesized within Prl80* a '* po/ through 
suppression of the amber termination codon located at the 
end of the gag gene. 

In this report we describe the purification and partial 
sequence of a protease from FeLV, its location in the viral 
genome, and the translational control for its synthesis. FeLV 
(Rickard strain AB) was grown in feline lymphoblasts (16) 
and purified by sucrose density gradient centrifugation (13). 
In earlier studies we demonstrated Mo-MuLV protease 
activity under assaying conditions which involved endoge- 
nous substrate i.e., uncleaved Pr65* fl * of Mo-MuLV (19) 
released by disruption of the virus by Nonidet P-40. Subse- 
quently, we adopted for routine analysis a method of assay- 
ing Mo-MuLV protease (20) with an exogenous substrate, 
Gazdar-mouse sarcoma virus (Gz-MSV) Pr65* fl * (6). At- 
tempts to detect FeLV protease activity by the former 
method were unsuccessful because of the extremely low 
levels of uncleaved Pr65 ?fl * in purified FeLV. Therefore, in 
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the present studies designed to purify FeLV proteolytic 
enzyme, we used Gz-MSV Pr65* 0? to assay FeLV protease 
activity. The suitability of Gz-MSV Pr65* a * as a substrate 
was expected because the cleavage sites in the mouse and 
feline mature gag proteins are very similar based on protein 
and nucleotide seuqence data available for both systems (10, 
11, 13, 17). The previously described methods for protease 
assay (18) and purification were used without major modifi- 
cations. 

To purified FeLV (158 mg) suspended in 2 ml of STE 
buffer (0.13 M NaCl, 0,01 M Tris hydrochloride [pH 7.2], 
and 0.001 M EDTA), 20 volumes of cold acetone (-70°C) 
was added, and the suspension was centrifuged at 4,000 x g 
for 10 min at 4°C. The precipitate was dried in vacuo. To 
solubilize the protease, the acetone powder was extracted 
(at 4°C for 30 min, stirred continuously) with 4 ml of TD 
buffer (0.02 M Tris hydrochloride [pH 7.0], 5 mM dithiothrei- 
tol [Sigma Chemical Co., St. Louis, Mo.]) containing 1.0 M 
NaCl. The extract was centrifuged at 20,000 x g for 20 min 
at 4°C. The supernatant was then fractionated on a Sepha- 
cryl S-200 column (2.5 by 90 cm) with TD buffer, and the 
protease activity was determined as described above by 
adding 100 \x\ of each fraction to 15 u-g of Gz-MSV substrate 
in 1% Nonidet P-40. The protease-active fractions were 
pooled, lyophilized, and then further fractionated by re- 
verse-phase high-pressure liquid chromatography (RP- 
HPLC) on a Bondapak C 18 column (0.39 by 30 cm) (Waters 
Associates, Inc., Milford, Mass.). The protease activity was 
eluted with about 33% acetonitrile (Fig. 1A; fraction 30) and 
was detected by assaying lyophilized 5% aliquots of the 
fractions. When fractions 26 to 35 were incubated with 
disrupted Gz-MSV, fractions 27 to 33 cleaved Pr65* fl * into 
what appears to be the mature proteins p30, pi 5, pl2, and 
plO (Fig. IB). The peak activity appeared in fraction 30. In 
addition, in fractions 28 and 32 intermediate cleavage prod- 
ucts, presumably Pr40* a * (p30 plus plO) and Pr27*°* (pl5 plus 
pl2), were produced as observed in previous studies with 
Mo-MuLV protease (20). The purified protein, the majority 
of which eluted in fractions 29 to 31, showed a single band in 
sodium dodecyl sulfate-polyacrylamide gel electrophoresis. 
From this analysis, the total protein was estimated to be 
approximately 7 jtg. 
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FIG. 1. Purification of protease by RP-HPLC. (A) Absorbance 
profile. Sephacryl S-200 chromatography fractions were applied to a 
Bondapak Ci 8 column and then eluted with increasing concentration 
of acetonitrile as follows: 0 to 20% acetonitrile over 20 min, 20 to 
30% acetonitrile over 30 min, and 30 to 60% acetonitrile over 90 min 
at a constant flow rate of 1.0 ml/mi n. Absorbance was measured at 
235 nm or at 206 nm as indicated. (B) Assay of RP-HPLC fractions 
for protease activity. One-twentieth of each fraction was lyophilized 
and assayed for protease activity as described in the text. Proteins 
were visualized by staining with Coomassie brilliant blue R-250. 



To determine the NH 2 -terminal amino acid sequence of 
the protease, approximately 50 pmol of RP-HPLC-purified 
protein was subjected to automated Edman degradation in a 
gas-phase sequenator (9) with the program supplied by the 
manufacturer (Applied Biosystems, Inc). Conversion of the 
anilinothiazolinone amino acids to the phenylthiohydantoin 
amino acids was accomplished with 25% trifluoroacetic acid 
in water. Phenylthiohydantoin amino acids were identified 
and quantitated by RP-HPLC (8). The first 15 residues of the 
NH 2 -terminal amino acid sequence were determined. The 
residue assignments together with the quantitative recover- 
ies are given in Fig. 2A. 

To examine whether the protease protein is viral coded, 
we aligned the experimentally determined sequence with the 
amino acid sequence deduced from DNA nucleotide se- 
quence (10). The protease amino acid sequence begins with 
asparagine coded by triplet 2072 to 2074 and overlaps with 
the last four amino acids of the gag region (Fig. 2B). 
However, the third amino acid residue of protease was 
glycine and not glutamic acid, predicted from nucleotide 
sequence as shown in the alignment. The fifth amino acid, 
glutamine, corresponds to the gag termination codon TAG 
positioned at nucleotides 2084 to 2086. This is followed by a 
glutamic acid residue coded by the first triplet of the pol 
gene. The nucleotide sequence then continues in the gag 
reading frame and matches the protein sequence Thr-Gln- 
Gly-Gln-Asp-Pro-Pro-Pro-. At nucleotide 2113, however, we 
encounter a TGA stop codon in the DNA sequence. But if 
we remove one of the eight consecutive C residues from the 
sequence occupying positions 2104 through 2111, the nucle- 
otide sequence matches both the FeLV protease sequence as 
determined here as well as the previously reported MuLV 
protease sequence (18) beyond nucloeotide 2111. The result 
is that FeLV protease is now in the same frame as gag and 
reverse transcriptase. This suggests that the DNA clone of 
FeLV strain B reported by Laprevotte et a!. (10) is a 
noninfectious clone which, like many cloned DNAs of 
retroviruses, is defective. The protease NH 2 -terminal amino 
acid sequence was found to be different from the nucleotide 
sequence of strain B at position 3 (Gly-Glu) and at position 7 
(Thr-Ser). This may indicate strain differences and suggests 
that the major component of FeLV(AB) is strain A; the 
NH 2 - terminal amino acid sequence analysis data of plO (3) 
and pl2 (unpublished data) also show differences from the 
strain B nucleotide sequence. In conclusion, these results 
show that the FeLV protease is a viral-coded enzyme and 
that it is synthesized by readthrough of the amber termina- 
tion codon as in the murine system (18). 

When we align the amino acid sequence of FeLV and 
MuLV protease deduced from the DNA sequence, 80% 
homology (25 different of 125 residues) is observed. Further- 
more, the FeLV protease cleavage products of Pr65*°* from 
Gz-MSV made sense when we compared the feline cleavage 
sites with the murine cleavage site between the gag proteins 
(pl5, pl2, p30, and plO); each feline cleavage site is very 
similar to the corresponding murine site (Fig. 2C). In both 
FeLV and Mo-MuLV systems (18), a single protease is 
responsible for the complete proteolytic processing of 
Pr65 fffl * which is temporally linked to virus maturation. This 
conclusion regarding specificity is supported by the similar- 
ity in the chemical structure of all the cleavage sites (Fig. 
2C). Although the peptide bonds cleaved (viz., tyrosyl- 
proline between pl5 and pl2, phenylalanyl-leucyl-proline 
between pl2 and p30, leucyl-alanine between p30 and plO, 
and leucyl-asparagine-threonine between plO and protease) 
are not identical, the carboxyl-terminal amino acid se- 
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A) 

1 5 10 15 

NH2-Asn-Leu-Gly-Asp-Gl n-Glu-Thr-61 n-Gly-Gl n-Asp-Pro-Pro-Pro-Glu- 

47 42 36 24 35 22 17 23 19 22 11 16 16 17 13 



B) 



DNA sequence 



Predicted amino 
add sequence 



gag pol 



2104 2111 



, . CTCAACTTAGAAGATTAG6A6A6TCAGGGCCAGGACCCCCCCCCCT6AGCCCA6GATA . . 

plO ♦ 

. . . LeuAsnLeuinuAsp***GluSerGl nGlyGl nAspProProPro*** 

***LysIleArgArgValArgAl aArgThrProProProGluProArglle. . 



FeLV Protease 



NH?-AsnleugTyAspG1nGluTHrGl nGlyGl nAsp ProProProGl u , 



Mo-HuLV 
protease 



NHy-TnrLeuAspAspGlnGlyGlyGl nGlyGl nGl u ProProProGl uProArgll e, . 



C) 

FeLV 
MuLV 



C-4 C-3 C-2 C-l 

-Ser-Ser-Leu-Tyr*Pro-Ala-Leu-7hr- 
-Ser-Ser-Leu-Tyr -Pro-Al a-leu-Thr- 



p15*p!2 



FeLV -Ser-Gln-Al a-Leu*Pro-Leu*Arg-Gl u 
MuLV -Ser-Gln-Al a-Phe-Pro-Leu-Arg-Al a- 



p!2*p30 



FeLV -Thr-Lys-Val-Leu-Ala-TTir-Val-Val- 
MuLV -Ser-Ly s-Leu-Leu *A1 a-Thr-Val -Val - 



p30«pl0 



FeLV - Se r-Th r -Leu-Leu ■ As n -Leu -Gl u- Asp- 



plO*protease 



MuLV -Thr-Ser-Leu-Leu»Thr-Leu-Asp-Asp- 

FIG. 2. (A) NH 2 -terminal sequence of FeLV protease. Number below each residue is the'yield (in picomoles) of the phenylthiohydantoin 
amino acid. (B) Alignment of the NH 2 -terminal amino acid sequence with the DNA sequence of FeLV (10). The amber codon UAG is 
translated into glutamine (underline). Lines above amino acids indicate differences between deduced and determined sequences. Dotted lines 
under amino acids indicate differences between FeLV and Mo-MuLV protease (20). (C) Comparison of gag cleavage site sequences of FeLV 
and Mo-MuLV. 



quences of the cleavage products are strikingly similar. They 
suggest a consensus (14) as follows: the penultimate residue 
(C-2) to the newly generated carboxyl terminus (C-l) is 
always a hydrophobic amino acid. The residue (C-3) next to 
it always has a polar side chain, charged or uncharged 
(serine, glutamine, and lysine), and the C-4 residue is either 
serine or threonine, both of which are known to initiate 
P-turns. 

The exact mechanism of suppression is not clear. How- 
ever, it is quite possible that the insertion of glutamine 
results from the misreading of the termination codon (UAG) 
by normal glutamyl tRNA as we proposed for Mo-MuLV 
(18). Other translational control mechanisms, such as sup- 
pression by nonsense suppressor tRNA (7), by splicing and 
by frame shift suppression (1, 5), have also been observed in 
both procaryotic and eucaryotic cell systems. The effects of 
the surrounding sequences on the suppression of a nonsense 
codon have also been investigated in bacteria (12). Retrovi- 
ruses provide a useful model system for studying transla- 



tional control in eucaryotic cells. It will be of interest to 
compare the influence of neighboring sequences on the 
suppression of certain amber termination codons in retro- 
viral mRNAs. 
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Recombinant vaccinia viruses were used to study the processing of hepatitis C virus (HCV) nonstructural 
polyprotein precursor. HCV- specific proteins and cleavage products were identified by size and by immuno- 
precipitation with region-specific antisera. A polyprotein beginning with 20 amino acids derived from the 
carboxy terminus of NS2 and ending with the NS5B stop codon (amino acids 1007 to 3011) was cleaved at the 
NS3/4A, NS4A/4B, NS4B/5A, and NS5A/5B sites, whereas a polyprotein in which the putative active site serine 
residue was replaced by an alanine remained unprocessed, demonstrating that the NS3-encoded serine-type 
proteinase is essential for cleavage at these sites. Processing of the NS3'-5B polyprotein was complex and 
occurred rapidly. Discrete polypeptide species corresponding to various processing intermediates were 
detected. With the exception of NS4AB-5A/NS5A, no clear precursor-product relationships were detected. 
Using double infection of cells with vaccinia virus recombinants expressing either a proteolytically inactive 
NS3'-5B polyprotein or an active NS3 proteinase, we found that cleavage at the NS4A/4B, NS4B/5A, and 
NS5A/5B sites could be mediated in trans. Absence of trans cleavage at the NS3/4A junction together with the 
finding that processing at this site was insensitive to dilution of the enzyme suggested that cleavage at this site 
is an intramolecular reaction. The rrarcs -cleavage assay was also used to show that (i) the first 211 amino acids 
of NS3 were sufficient for processing at all trans sites and (ii) small deletions from the amino terminus of NS3 
selectively affected cleavage at the NS4B/5A site, whereas more extensive deletions also decreased processing 
efficiencies at the other sites. Using a series of amino-terminaUy truncated substrate polyproteins in the trans- 
cleavage assay, we found that NS4A is essential for cleavage at the NS4B/5A site and that processing at this 
site could be restored by NS4A provided in cis (i.e., together with the substrate) or in trans (i.e., together with 
the proteinase). These results suggest that in addition to the NS3 proteinase, NS4A sequences play an 
important role in HCV polyprotein processing. 



Infection with hepatitis C virus (HCV) is considered to be 
the major cause of posttransfusion and sporadic, community- 
acquired non-A, non-B hepatitis (for a recent review, see 
reference 23). It can lead to various clinical manifestations, 
including acute hepatitis, chronic hepatitis, liver cirrhosis, or 
an asymptomatic carrier state (22). In addition, the high 
prevalence of anti-HCV antibodies in chronically infected 
anti-hepatitis B antigen-negative patients with hepatocellular 
carcinoma indicates a strong association between HCV infec- 
tion and tumor development (9, 12, 15, 29, 35). 

Since the initial cloning (10), a number of new HCV isolates 
have been characterized (for a summary, see references 4 and 
31). As deduced from infectivity studies with chimpanzees and 
from comparative sequence analyses, HCV has been classified 
as a separate genus in the family Flaviviridae together with 
pestiviruses and fiaviviruses (7, 11, 27). These viruses have in 
common a virion with a lipid envelope, a single-stranded RNA 
genome of positive polarity encoding one long open reading 
frame of ca. 3,000 to 4,000 codons, and a similar genomic 
organization. The structural proteins (in the case of HCV, 
core-envelope 1 (El]-E2) are located in the amino-terminal 
region and are followed by the nonstructural (NS) proteins 
(NS2, NS3, NS4A/B, and NS5A/B for HCV). 

Production of mature proteins from the polyprotein precur- 
sor is accomplished by a series of proteolytic cleavages. Recent 
studies have shown that HCV structural proteins are generated 
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from the polyprotein by host cell signalases (18, 19, 24, 26, 32), 
whereas processing of the NS polyprotein requires at least two 
virus-encoded enzymes: the NS2-3 proteinase (16, 20) cleaving 
between NS2 and NS3 and the NS3 proteinase cleaving at all 
the sites further downstream (2, 13, 17, 20, 36). Biochemical 
and mutational analyses suggest that the NS2-3 proteinase is a 
zinc-dependent metalloproteinase (20), which is, to our knowl- 
edge, without precedent among fiaviviruses and pestiviruses. In 
contrast, NS3 appears to be a serine-type proteinase as are 
flavivirus NS3 and pestivirus p80 proteinases (5, 6, 8, 14, 28, 
37). 

In this report, recombinant vaccinia viruses were used to 
further characterize HCV polyprotein processing and to define 
the activity of the NS3 proteinase more precisely. Our results 
strongly suggest that the carboxy terminus of NS3 is generated 
by an intramolecular reaction, whereas cleavage at all sites 
further downstream can be mediated in trans by a proteinase 
containing the amino-terminal 211 amino acids of NS3. Fur- 
thermore, we have found that sequences from the NS4 region 
play an important role in polyprotein processing, especially for 
cleavage at the NS4B/5A site. 

MATERIALS AND METHODS 

Construction of plasmids for homologous recombination. 

The basic plasmids pATA 1007-2234, pATA 1007-1830, and 
pBSK 1007-1912 containing HCV sequences inserted into the 
modified vaccinia virus recombination vector pATA-18 or into 
the T7 transcription plasmid pBSK (Stratagene, Zurich, Swit- 
zerland) have been described before (2) (numbers refer to the 
first and the last amino acids of the expressed HCV polypro- 
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tein fragment). To obtain plasmid pATA 1007-3011 bearing 
either a wild-type or mutated NS3 proteinase, pATA 1007- 
2234/wild type (wt) and pATA 1007-2234/S-*A (in which the 
putative active site serine residue was replaced by an alanine 
residue [2]) were restricted with Spel at the multiple cloning 
site (MCS) and Hpal at HCV nucleotide position 5969 (ac- 
cording to the nomenclature of Kato et al. [25]) and combined 
with Hpal-Bsu36l (5969 to 8134) and Bsu36hSpel (8134 to 
MCS) HCV fragments. These fragments were isolated from 
plasmids in which HCV sequences cloned from the serum of a 
chronically HCV-infected patient had been inserted (2). Plas- 
mids containing HCV sequences from NS4A to NS5B (amino 
acids 1658 to 3011 of the polyprotein), NS4B to NS5B (1712 to 
3011), or NS5A to NS5B (1973 to 3011) were constructed by 
PCR with upstream primers containing at their 5' ends an 
EcoRl restriction site and an ATG codon. After 10 cycles, 
DNA fragments were purified by preparative gel electrophore- 
sis, restricted with EcoRl (HCV nucleotide position 6687), and 
inserted into the EcoRI-digested plasmid pATA 1007-301 1. 
Plasmid pATA 1007-1647 was obtained by insertion of an 
EcoRl (sticky end)-W5tl (blunt ended after treatment with the 
Klenow enzyme) HCV fragment into pATA-18 cut with EcoRl 
(sticky end) and BamHl (blunt ended). To construct plasmid 
pATA 1007-1564 an EcoRl (MCS)-Srul (nucleotide position 
5020) fragment isolated from pBSK 1007-1912 was inserted in 
pATA-18 restricted with EcoRl and Spel. Plasmid pATA 
1007-1355 was obtained by insertion of an EcoRl (MCS)- 
BstEll (nucleotide position 4391) fragment into the EcoRh 
Sma I -restricted pATA-18. To obtain plasmid pATA 1007- 
1395, an EcoRl-Xbal fragment (MCS to 4513) was isolated 
from plasmid pBSK A45 13-5050 (2) and inserted into 
pATA-18 restricted with EcoRl and Spel. Plasmids pATA 
1007-1269 and pATA 1007-1238 were constructed in the same 
way with pBSK A4 133-4783 and pBSK A4040-4713 (2) to 
isolate the HCV DNA fragments. Construction of the basic 
plasmids carrying various 5'-deleted NS3 fragments has been 
described recently (2). 

Expression of HCV-specific proteins in Escherichia coli and 
generation of antisera. Expression of NS3-, NS3/4-, and NS5A- 
specific polyprotein fragments has been described previously 
(2) (see Fig. 2A). To express the NS5B-specific protein, a DNA 
fragment from nucleotide positions 7587 to 8207 (amino acids 
2419 to 2625 of the polyprotein) was amplified by PCR with 
oligonucleotides carrying BamHl restriction sites at their 5' 
ends and was inserted into the Bam HI -restricted vector pDS 
561/RBSII 6xHis (33). After induction, the protein was puri- 
fied by metal chelate affinity chromatography under denaturing 
conditions (33). Further purification of the protein and gener- 
ation of antisera were done exactly as described previously (2). 
The antibody titer as determined by enzyme-linked immu- 
nosorbent assay was 1:512,000. 

Generation of recombinant vaccinia viruses. Human tk~ 
143 cells were infected with the temperature-sensitive vaccinia 
virus vts-1 (kindly provided by Hans-Jurgen Schlicht, Univer- 
sity of Ulm, Ulm, Germany) at a multiplicity of infection of 1 
for 1 h at room temperature in Dulbecco's modified minimal 
essential medium (DMEM). After removal of the inoculum, 
cells were incubated in medium supplemented with 10% fetal 
calf serum (FCS) for 2 h at 33°C. Medium was removed, cells 
were washed several times with phosphate-buffered saline 
(PBS), and the calcium phosphate precipitate, prepared as 
described previously (3) with 1 p.g of vaccinia virus wild-type 
DNA and 0.5 to 2 u.g of plasmid DNA in a total volume of 250 
u.1, was added dropwise. After 1 h at room temperature, 
DMEM without FCS was added, cells were incubated for 2 h 
at 40°C and washed four times with PBS, and FCS-containing 
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FIG. 1. HCV genome structure and expression constructs. (A) 
Diagram of the HCV genome encoding the structural proteins in the 
5'-terminal quarter followed by the NS proteins 2 to 5B. The 5' and 3' 
nontranslatcd regions are indicated as thin lines. A detailed view of the 
NS protein region (amino acids 809 to 3011 of the polyprotein) is 
drawn below. Cleavage sites of the NS2-3 proteinase (|) (16, 20) and 
the NS3 proteinase (>0) (2, 13, 17, 20, 36) are given. Numbers above 
the arrows refer to the amino acids at the PI positions of the scissile 
bonds. (B) HCV polyprotein expression constructs used in this study. 
Lines depict regions of the HCV genome expressed with recombinant 
vaccinia viruses and are drawn to scale and oriented with respect to the 
diagram in panel A. Numbers refer to the first and last amino acids of 
the HCV polyprotein expressed with the recombinant vaccinia viruses. 



medium was added. Cells were lysed after 48 h at 40°C in 10 
mM A^-2-hydroxyethylpiperazine-N'-2-ethanesulfonic acid, and 
the lysate was used for infection of human tk~ 143 cells for 1 
h at room temperature as described above. Finally, complete 
medium containing 100 u,g of 5-bromo-2'-deoxyuridine (Sig- 
ma, Deisenhofen, Germany) per ml was added, and cells were 
incubated for 48 h at 37°C. Large-scale preparations of recom- 
binant vaccinia viruses were grown on HeLa cells, and titers of 
infectious progeny were determined by plaque assay on human 
tk" 143 cells. 

Metabolic labelling of infected cells and characterization of 
HCV-specific proteins. HeLa cells were infected at a multiplic- 
ity of infection of 5 to 10 as described above and 16 h later were 
incubated in methionine- and FCS-free medium for 1 h. After 
addition of the same medium supplemented with 100 p,Ci of 
[ 35 S]methionine (Amersham Life Science, Braunschweig, Ger- 
many) and incubation for various times, cells were rysed in 
TNE (10 mM Tris-HCl [pH 8.0], 100 mM Nad, 1 mM EDTA) 
with 1% Triton X-100. The lysate was clarified by a 15-min 
centrifugation at 15,000 X g at 4°C, and proteins in the 
supernatant were precipitated by the addition of 2% sodium 
dodecyl sulfate (SDS) and 5% trichloroacetic acid. After 10 to 
30 min at 0°C, precipitated proteins were collected by a 5-min 
centrifugation at 6,000 X g and resolved in protein sample 
buffer (200 mM Tris-HCl [pH 8.8], 5 mM EDTA, 2% SDS, 1% 
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FIG. 2. Proteolytic processing of an NS3'-5B polyprotein expressed with recombinant vaccinia viruses. (A) Schematic representation of the 
HCV genome segment expressed under control of the vaccinia virus 11-kDa late promoter. Fragments of the polyprotein used to generate antisera 
are indicated as dotted bars. The apparent molecular masses (in kilpdaltons) of individual processing products are given below. (B) Analysis of 
HCV-specific proteins isolated from HeLa cells infected with recombinants expressing an unaltered NS3'-5B polyprotein (wl 007-30 11 Avt) or an 
NS3'-5B with an enzymatically inactive NS3 proteinase (wlOO7-3011/S*A). Cells were labelled for 1 h with [ 35 S]methionine, and proteins were 
isolated from the lysate by immunoprecipitation. To demonstrate the specificities of the detected proteins, immunoprecipitations were performed 
in the absence (-) or presence (+) of a homologous competitor. Numbers to the right refer to the sizes of marker proteins (in kilodaltons). 



2-mercaptoethanol, 10% sucrose, and 0.1% bromophenol 
blue). After 5 min of boiling, samples were diluted to RIPA 
buffer by adding 20 volumes of RIPA buffer without SDS (PBS, 
1% Triton X-100, 0.5% sodium deoxycholate) containing 1 
mM phenylmethylsuifonyl fluoride (Sigma). Finally, 15 pJ of 
packed protein A-Sepharose containing preadsorbed immuno- 
globulin (corresponding to 3 to 6 pi of antiserum) was added 
and the samples were incubated overnight at 4°C with agita- 
tion. After three washes of the immunocomplexes with RIPA 
buffer, protein sample buffer containing 3.3% SDS and 2% 
2-mercaptoethanol was added, samples were boiled for 5 min, 
and half of the material was analyzed by SDS-polyacrylamide 
gel electrophoresis and fluorography. Competitions were done 
by adding 10 p.g of purified homologous antigen to the 
immunoprecipitation. 

In vitro transcription and translation. A detailed descrip- 
tion of these methods is given in reference 2. In brief, the 
linearized plasmid pBSK 1007-1912 was used for in vitro 
transcription, and after phenol extraction and ethanol precip- 
itation, RNA was quantified by comparison of a serial dilution 
of the RNA with a concentration standard after electrophore- 
sis through an agarose gel. RNA was used for in vitro 
translations in a rabbit reticulocyte lysate according to the 
instructions of the manufacturer (Promega, Heidelberg, Ger- 
many). 



RESULTS 

Processing activity of an HCV NS3'-5B polyprotein* The 

goal of our studies was an examination of HCV polyprotein 
processing mediated by the NS3 serine-type proteinase by 
using an NS3'-5B polyprotein expressed with recombinant 
vaccinia viruses (Fig. 1) (a prime indicates an HCV protein 
with a nonauthentic amino or carboxy terminus). This polypro- 
tein was selected because it was expressed to much higher 
levels than the full-length polyprotein (la) and because it is 
predicted to properly reflect all the cleavage reactions medi- 
ated by the NS3 proteinase for two reasons: (i) we (2) and 
others (13, 17, 20, 36) have shown that sequences amino 
terminal to NS3 are dispensable for all NS3-mediated cleav- 
ages, and (ii) mutational ablation of the NS2-3 proteinase does 
not affect processing at any of the NS3-dependent cleavage 
sites (16, 20). The NS3'-5B polyprotein includes the complete 
NS3-4AB-5AB open reading frame and initiates with 20 amino 
acids derived from the carboxy terminus of NS2 (Fig. 2, 
wl007-3011/wt). To correlate the generation of individual 
processing products with NS3-encoded activity, a further re- 
combinant expressing an NS3'-5B polyprotein in which the 
putative active site serine residue was replaced by an alanine 
residue (wl 007-301 1/S-»A) was made (Fig. 2). 
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Infection of HeLa cells with recombinant virus expressing 
the active proteinase generated proteins with apparent molec- 
ular masses of about 70, 27, 58, and 68 kDa corresponding to 
NS3', NS4B, NS5A, and NS5B, respectively, thus demonstrat- 
ing specific cleavage at the NS3/4A, NS4A/4B, NS4B/5A, and 
NS5A/5B sites (Fig. 2). Two additional double bands in the 
molecular mass range of about 67 and 95 kDa were consis- 
tently observed in all immunoprecipitations. They corre- 
sponded to vaccinia virus proteins binding nonspecifically to 
protein A-Sepharose since their amounts could not be reduced 
in immunoprecipitations using homologous competitors (Fig. 
2, + lanes). We did not detect NS4A, a protein of about 8 kDa, 
possibly because of low reactivity with our antiserum and low 
labelling efficiency (the predicted NS4A of our isolate would 
.contain only one methionine). However, the appearance of 



NS3 and NS4B can be taken as evidence of cleavage between 
NS3 and NS4A as well as between NS4A and NS4B. 

In addition, various amounts of processing intermediates 
were detected and identified by their apparent molecular 
weights and their reactivities with the individual antisera as 
NS4AB and, much more visible in the experiments described 
below (e.g., see Fig. 3A), NS4AB-5A and NS4B-5A . None of 
these cleavage products was obtained with the polyprotein in 
which the active site serine residue was replaced by an alanine 
residue. Instead, the unprocessed polyprotein with an apparent 
molecular mass of about 220 kDa was detected, illustrating 
that the NS3-encoded proteinase is essential for cleavage at all 
sites downstream of its own carboxy terminus. 

Kinetics of NS3'-5B polyprotein processing. To identify 
possible precursors, pulse-labelling experiments were carried 




FIG. 3. Kinetic analyses of NS3'-5B polyprotein processing by continuous and pulse-chase labelling. (A) Cells were infecied with the 
recombinant expressing NS3'-5B and 16 h later radiolabeled with ( s S]methionine for the indicated times. (B) Infected cells were labelled for 20 
min and, after being washed, incubated in nonradioactive medium for various times. HCV-spccific proteins were isolated from the cell lysate by 
immunoprecipitaiion with the indicated antisera. 
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FIG. 4. Cleavage of an NS3'-5B polyprotein in trans. (A) Schematic representation of the HCV NS3'-5B and NS3'-5A' polyproteins and the 
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HCV polyproteins (wt) or polyproteins with an inactive NS3 proteinase (S— »A). Sixteen hours after infection, cells were radiolabeled for 1 h and 
then underwent cell lysis and immunoprecipitation (IP). 



out with HeLa cells infected with vaccinia virus recombinants 
expressing the enzymatically active NS3'-5B polyprotein. The 
processed NS3' was already visible after 5 min of labelling (Fig. 
3A, left panel); labelling times as short as 2 min still allowed 
clear detection of this protein (data not shown), suggesting that 
cleavage at the NS3/4A site is a rapid event. NS4B was 
detected after 60 min of continuous labelling (25 min with 
prolonged exposures), indicating delayed cleavage between 
NS4A and NS4B. In contrast, the NS4AB processing interme- 
diate was generated very rapidly, being detectable after 5 min 
of labelling (Fig. 3A, left panel). NS5A and NS5B could be 
detected after 15 and 10 min of labelling, respectively. In 
addition to the mature processing products, several higher- 
molecular-mass proteins of about 85, 95, and 170 kDa were 
visible. Antiserum to NS3/4 and NS5A reacted with the 170 
and 95 kDa proteins, whereas the 85 kDa protein was precip- 
itated only by the NS5A-specific antiserum. None of these 
proteins reacted with the antiserum to NS5B. On the basis of 
these reactivities and deduced molecular masses, the 170-kDa 
protein can be designated NS3'-5A, the 95-kDa protein can be 
designated NS4AB-5A, and the 85-kDa protein can be desig- 
nated NS4B-5A. No NS5B-specific processing intermediates 
were detected. 

To analyze the stabilities of these proteins and to identify 
possible precursor-product relationships, pulse-chase experi- 
ments were performed (Fig. 3B). Results from immunoprecipi- 
tations with the NS3/4-specific antiserum revealed that NS3' 
forms a stable peptide with a half-life of about 4 h. In con- 



trast, the NS3'-5A and NS4AB intermediates were unstable 
and disappeared with half-lives of about 45 and 15 min, 
respectively. The only protein with a clear precursor-product 
relationship was NS5A, whose level increased over time con- 
comitantly with a decrease of NS4AB-5A and NS4B-5A inter- 
mediates. Using longer pulse intervals, we could also detect 
NS4B but found it to be very unstable, with a half-life of only 
about 15 min (data not shown). The only protein detected with 
the NS5B-specific antiserum was NS5B (Fig. 3B), a stable 
protein decaying with a half-life of about 6 h. 

Processing of an NS3'-5B polyprotein in trans. So far we had 
used a polyprotein in which the proteinase was part of the 
polyprotein and therefore directly linked with its substrate. To 
differentiate whether processing at the various sites was strictly 
an intramolecular reaction or could also occur intermolecu- 
larly, transcomplementation experiments were performed (Fig. 
4). Cells were infected with a recombinant vaccinia virus 
expressing the proteolytically inactive NS3'-5B polyprotein 
along with a recombinant directing the expression of an 
NS3'-5A' protein with an enzymatically active proteinase 
domain (Fig. 4A). As shown in Fig. 4B (lanes 15 to 18), this 
double infection yielded proteins corresponding to NS5 A (lane 
17) and NS5B (lane 18), both of which were not detected in 
cells expressing either inactive NS3'-5A' or NS3'-5B protein 
(lanes 9 to 1 1 and 1 to 4, respectively) nor in cells expressing 
both defective polyproteins together (lanes 19 to 22). A novel 
processing product of 78 kDa which was not produced by the 
active NS3'-5B polyprotein (Fig. 4B, lanes 5 to 8) was observed 
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FIG. 5. Cleavage at the NS3/4A site is insensitive to dilution. (A) 
Various amounts of an HCV RNA encoding an NS3'-4B' polyprotein 
(amino acids 1007 to 1912) were used for in vitro translation with the 
rabbit reticulocyte rysate. After 60 min at 30°C, proteins were analyzed 
by electrophoresis in an 11% polyacrylamide gel and by fluorography. 
Numbers above each lane refer to the amount of RNA (in micrograms 
per milliliter) used for each translation. The positions of marker 
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contained in the NS3'-4B' and NS3' proteins were quantitated with a 
phosphoimager. The precursor product ratio obtained with 20 jig/ml 
was set as 100. 



reacting with the NS3/4-specific antiserum (compare lanes 6 
and 16). The size and immunoreactivity of this protein, to- 
gether with the fact that no NS4AB-5A was observed with this 
infection but only an NS4B-5A intermediate (Fig. 4B; compare 
lanes 7 and 17), suggested that this polypeptide corresponded 
to an unprocessed NS3'-4A fusion protein. It should be noted 
that the labelling procedure required to monitor efficient trans 
cleavage and the kinetics of processing in trans (see below) did 
not allow clear detection of the unstable NS4B (half-life of 
about 15 min). However, the appearance of NS3'-4A and 
NS5A is taken as evidence of processing at the NS4A/4B and 
NS4B/5A junctions. Taken together, we found that cleavage 
between NS4A and NS4B, NS4B and NS5A, and NS5A and 
NS5B could be mediated in trans, whereas processing at the 
NS3/4A site appeared to occur essentially in cis, i.e., intramo- 
lecularly. 

If cleavage between NS3 and NS4A is an intramolecular 
reaction, it should follow first-order reaction kinetics and be 
concentration independent. Therefore, we examined the pro- 
tein products obtained by in vitro translation of an NS3'-4AB' 
RNA using different amounts of RNA to effectively change the 
concentration of the proteinase. As shown in Fig. 5A, transla- 
tion of this RNA yielded unmistakable amounts of unpro- 
cessed precursor and the NS3' cleavage product, while NS4AB' 
could not be detected under these conditions (2). Quantitation 
of the radioactivities contained in these bands revealed no 
significant difference in the precursor-to-product ratio, irre- 
spective of the RNA concentration (Fig. 5B). The same result 
was found with shorter and longer translation times (data not 
shown). These results together with the finding that cleavage 



between NS3 and NS4A appeared to occur in cis strongly 
suggested that processing at this site was an intramolecular 
reaction. 

To analyze the kinetics of polyprotein processing in trans, 
pulse-chase experiments were performed with the proteolyti- 
cally inactive NS3'-5B polyprotein as a substrate and with an 
NS3'-4B' proteinase (Fig. 6A). Cleavage was slower than 
processing of the enzymatically active NS3'-5B (Fig. 3) because 
obvious amounts of unprocessed substrate were detected even 
after a 2-h chase (Fig. 6B). Amounts of NS5B and NS5A 
increased with time, reaching maximum levels after about 1 
and 2 h, respectively. Surprisingly, processing between NS4A 
and NS4B occurred with the fastest kinetics as shown by the 
rapid production of NS3'-4A and NS4B-5AB. This is in 
contrast to the results obtained with the enzymatically active 
NS3'-5B, when cleavage at this site was delayed (Fig. 3). It 
should be noted that essentially the same kinetics were found 
with the NS3'-5A' proteinase described above (data not 
shown), showing that sequences downstream of NS4B are 
dispensable for full proteolytic activity. 

Much slower kinetics of trans cleavage and a slightly differ- 
ent pattern of processing products and processing intermedi- 
ates were observed with an NS3 proteinase lacking NS4 
sequences and 10 amino acids from the carboxy terminus of 
NS3 (wl007-1647) (Fig. 6C). Compared with the results 
obtained with the NS3'-4B' proteinase, processing at the 
NS4A/4B site occurred with similar kinetics as shown by 
the rapid appearance of NS3'-4A and the NS4B-5AB inter- 
mediate. However, cleavage between NS5A and NS5B and, 
most notably, between NS4B and NS5A was much slower, and 
chase periods of 6 and 1 h were required to detect obvious 
amounts of NS5A and NS5B, respectively. Furthermore, an 
unmistakable accumulation of the NS4B-5A processing inter- 
mediate during the 6-h chase was observed. Since essentially 
the same results were found with an NS3 proteinase with the 
authentic carboxy terminus (amino acid 1657 of the polypro- 
tein) (data not shown), sequences of the NS4 region coex- 
pressed with the proteinase enhance processing efficiencies at 
all /rarw-cleavage sites, particularly between NS4B and NS5A 
(see below). 

Mapping of the minimal NS3 domain required for proteo- 
lytic activity. Sequence comparisons between NS3 of HCV and 
other viral and nonviral proteinases suggest that the proteolytic 
activity is located in the amino-terminal domain of the mole- 
cule. Hence, to map the minimal NS3 domain required for 
polyprotein processing, a series of recombinant vaccinia vi- 
ruses was generated, directing the expression of various NS3 
proteins beginning at amino acid 1007 of the polyprotein and 
ending at different positions in the NS3 region. These recom- 
binants were used for double infections along with the recom- 
binant expressing the inactive NS3'-5B substrate polyprotein. 
Because of the slow reaction kinetics observed with protein- 
ases lacking NS4 sequences, in this and all subsequent exper- 
iments, cells were radiolabeled for 1 h and then chased for 6 
h. To exclude possible cleavage products generated by cellular- 
or vaccinia virus-encoded enzymes, infections with wild-type 
vaccinia virus were included. As shown by the production of 
NS3MA, NS5A, and NS5B (Fig. 7A, B, and C, respectively), 
all variant NS3 proteinases could process the substrate at all 
trans -cleavage sites. In independent experiments, no significant 
differences with respect to processing efficiencies and kinetics 
were found between these NS3 truncations and the full-length 
NS3 proteinase (data not shown). Variations in the amounts of 
processing products were due to variations in the expression 
levels of the NS3 proteins and to their different stabilities. 
Most of them had half-lives much shorter than those observed 
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FIG. 6. Pulse-chase analysis of NS3'-5B polyp rote in processing in trans by the NS3 proteinase in the presence and absence of NS4 sequences. 
(A) Schematic representation of the enzymatically inactive NS3'-5B polyprotein substrate and two variant proteinases. (B) Cells were infected with 
a combination of the recombinant expressing the inactive polyprotein and a recombinant expressing an enzymatically active NS3'-4B' proteinase 
(amino acids 1007 to 1830 of the polyprotein). Sixteen hours after infection, proteins were radiolabeled metabolically for 20 min and then 
incubated in nonradioactive medium for the indicated times. (C) Results from an analogous experiment with the same NS3'-5B substrate and an 
almost complete NS3 proteinase lacking NS4 sequences (amino acids 1007 to 1647 of the polyprotein). 



for full-length and nearly full-length NS3 molecules and could 
be detected only in cell rysates prepared directly after the 
pulse-labelling (Fig. 7D). In summary, these results demon- 
strate that the first 211 amino acids of NS3 (expressed from 
recombinant wl007-1238) are sufficient for processing at all 
fro rw-clcavage sites. 

To determine the borders of the minimal NS3 proteinase 
domain more precisely, the smallest protein was used to 
introduce a series of amino- terminal deletions which were 
tested in the same way. Removal of only 7 amino acids from 
the amino terminus of the NS3 protein abolished cleavage at 
the NS4B/5A site (wl034-1238) (Fig. 8B) without affecting 
processing at the other sites (Fig. 8A and C). A similar pattern 
was found when 23 amino-terminal amino acids were deleted 
(wl050-1238). Deletion of an additional 16 amino acids from 
the NS3 amino terminus also abolished cleavage at the 
NS5A/5B site, whereas processing between NS4A and NS4B 
still occurred, albeit with lower efficiency (wl 066- 1238). None 



of the cleavage products was obtained with an inactive NS3 
proteinase (wl050-1238/S— >A), excluding the possibility that 
vaccinia virus or cellular enzymes could substitute for an active 
NS3 proteinase. These results show that deletions in the NS3 
domain have differential effects on processing at the various 
cleavage sites and that cleavage at the NS4B/5A site is the most 
sensitive to amino-terminal deletions in the proteinase do- 
main, whereas trans cleavage between NS4A and NS4B can be 
mediated by a proteinase as short as 172 amino acids from the 
amino-terminal region of NS3. 

Further attempts to narrow down the minimal proteinase 
domain were complicated by the fact that these proteins 
probably did not react or reacted very poorly with our NS3- 
specific antiserum; therefore, the expression of these proteins 
could not be determined. 

NS4 sequences are essential for cleavage at the NS4B/5A 
site. In all the /ra/u-cleavage assays described so far, we used 
the NS3'-5B polyprotein as a substrate. To analyze whether 
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FIG. 7. Mapping of the minimal NS3 domain required for full proteolytic activity. Cells were infected either singly with one of the NS3 
recombinants or with vaccinia wild-type virus (w-wt) (D) or in combination with the recombinant expressing the inactive NS3'-5B substrate (A to 
C). Sixteen hours later, cells were labelled for 1 h and lysed directly (D) or after a 6-h chase (A to C). HCV-specific proteins were analyzed by 
immunoprecipitation with antisera specific for NS3 (A and D), NS5A (B), or NS5B (C). As a control, the results obtained with cells expressing 
an unaltered (wt) or mutated (S->A) NS3'-5B polyprotein are shown in the two lanes on the far left. Descriptions of the individual lanes in panels 
A and B also refer to the lanes directly below in panels C and D, respectively. 



sequences of the substrate have an influence on cleavage at the 
various sites, recombinant vaccinia viruses expressing NS4A- 
5B, NS4B-5B, and NS5A-5B polyproteins were made, and we 
determined their processing patterns as generated by an 
NS3'-4B' proteinase (Fig. 9, w!007-1830, II) or the NS3' 
proteinase lacking NS4 sequences and 10 amino acids from the 
NS3 carboxy terminus (Fig. 9, wl007-1647, I). 

trans cleavage of the NS4A-5B substrate with either of the 
proteinases gave rise to mature NS5A and NS5B, demonstrat- 
ing cleavage at the NS4B/5A and NS5A/5B sites (Fig. 9A). 
Furthermore, small amounts of NS4B indicative of processing 
between NS4A and NS4B were detected (data not shown). 
Cleavage of the NS5A-5B polyprotein substrate by either of 
the proteinases occurred with similar efficiency (Fig. 9C). 
However, complete processing of the NS4B-5B substrate was 
observed only with the NS3'-4B' proteinase (II) (Fig. 9B), 
whereas in the case of the pure NS3 proteinase, cleavage 
occurred only at the NS5A/5B junction and no processing 
between NS4B and NS5A was delected (I) (Fig. 9B). The same 
result was found when we increased the amount of the NS3' 



proteinase by raising the multiplicity of infection of the wl007- 
1647 recombinant. These results strongly suggest that NS4, 
particularly NS4A, is essential for processing between NS4B 
and NS5A and that NS4A provided by either the substrate (i.e., 
in cis) or the proteinase (i.e., in trans) can restore cleavage at 
this site. 

DISCUSSION 

The present study examined processing of the HCV NS 
proteins with the help of recombinant vaccinia viruses express- 
ing an NS3'-5B polyprotein. This protein was chosen because 
it contains all the sequences required for NS3-mediated 
polyprotein processing and has considerably higher expression 
levels than the complete polyprotein (la). 

Results from kinetic studies revealed a complex pattern of 
NS polyprotein processing. The kinetics of processing between 
NS3 and NS4A were compatible with the view that cleavage at 
this site occurs first. However, in addition, small amounts of an 
NS3'-5A intermediate were detected, suggesting that alterna- 



Vol. 68 r 1994 



HCV 



POLYPROTEIN PROCESSING 



5053 




e 



«B SAB — 



5A — 




crNS3/4 



3/4 I SA 1 bB 



1656-3011 
I 

3M I 5A I SB 



H 

3/4 | 5A J iB 



sub 

prot 
(P 



aNSSA 




B 




-4 Bo AH 




fNSSB 



't 



NS3 



FIG. 8. Fine mapping of the minimal NS3 proteinase domain. Cells 
were doubly infected with the recombinant expressing the inactive 
NS3'-5B polyprotein substrate and one of several recombinants ex- 
pressing various NS3 truncations. Following radiolabelling for 1 h, cells 
were lysed either directly (D) or after a 6-h incubation in nonradioac- 
tive medium (A to C). HCV-specific proteins were isolated by immu- 
noprecipitation with the given antisera. Results obtained with cells 
expressing an unaltered (wt) or mutated (S— »A) NS3'-5B polyprotein 
are shown in the two lanes on the far left. (D) To determine the 
expression of various NS3 truncations, cells were infected with the 
corresponding recombinants, labelled for 1 h, and analyzed by immu- 
noprecipitation with the N S3 -specific antiserum. Descriptions of the 
individual lanes in panels A and B also refer to the lanes directly below 
in panels C and D, respectively. 



tive cleavages such as initial processing at the NS5A/5B site 
exist. Cleavage between NS4A and NS4B appeared to be 
delayed as shown by the presence of the NS4A/B intermediate 
and the slow production of NS4B, which was detected only 
after 60 min of labelling compared with 2 min in case of NS3. 

At least two cleavage pathways seem to operate at the 
NS4B/5A site: (i) rapid cleavage as indicated by rapid produc- 
tion of the NS4AB intermediate and (ii) slow cleavage as 
indicated by the presence of the relatively stable NS4AB-5A 
intermediate (half-life of ca. 1 h), for which a clear precursor- 
product relationship was found. In contrast, processing be- 
tween NS5A and NS5B appears to be rather efficient because 
no precursor of NS5B could be detected. Thus, NS5B seems to 
be generated either cotranslationally from an unprocessed 
polyprotein by an intramolecular reaction or from an unstable 
processing intermediate which is cleaved very rapidly intermo- 
lecularly. 

Different kinetics were observed in the frans-cleavage reac- 
tion. Processing at the NS3/4A site could not be detected, and 
cleavage was fastest at the NS4A/4B site. A possible explana- 
tion for this apparent difference is that in the case of the 
enzymatically active NS3'-5B polyprotein, a transient intramo- 
lecular complex between the NS3 proteinase domain and NS4 
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FIG. 9. Processing of NS4A-5B (A), NS4B-5B (B), and NS5A-5B 
(C) polyproteins (amino acids 1658 to 3011, 1712 to 3011, and 1973 
to 3011 of the HCV polyprotein, respectively) in trans with recombi- 
nant vaccinia viruses expressing an almost complete NS3 proteinase 
(W1007-1647) (I) or an NS3'-4B' proteinase (wl007-l830) (II). For 
details, see the legend to Fig. 7. 



sequences, particularly NS4A, forms, which brings the cleavage 
site in close proximity to the active site of the enzyme to ensure 
rapid processing between NS3 and NS4A. Consistent with this 
idea, we obtained strong evidence that cleavage at this site 
occurs in cis. However, in the case of the inactive NS3'-5B 
polyprotein, the proteinase domain would remain stably asso- 
ciated with NS4, making the cleavage site inaccessible to 
proteinase molecules provided in trans. Hence, one might 
expect processing at this site to occur in trans with a polypro- 
tein lacking the NS3 proteinase domain. We are currently 
testing this hypothesis. 
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Using a genetic approach, we found that sequences from the 
NS4 region, particularly NS4A, are important for efficient 
polyprotein processing, especially at the NS4B/5A site. Cleav- 
age at this site was accomplished when NS4A was present in 
the substrate (in cis) and even more efficiently when it was 
coexpressed with the proteinase (in trans). The mechanism by 
which NS4A promotes cleavage is currently not known. Several 
possibilities can be envisaged, (i) NS4A might be important for 
proper folding of the substrate. This possibility seems unlikely 
because NS4 can be provided in trans to restore cleavage and 
the NS5 substrate lacking NS4 sequences is processed prop- 
erly, (ii) NS4A may form a complex with NS3 that is important 
for NS3 function and analogous to the NS2B/3 heterodimer 
described for flaviviruses (1). However, in contrast to those 
viruses, in the case of HCV, formation of such a complex 
would not be a prerequisite for proteinase activity per se 
because an NS3 proteinase lacking NS4 sequences still could 
cleave at the NS5A/5B junction, (iii) NS4A may serve as a 
membrane anchor attaching the more hydrophilic NS3 pro- 
teinase to the membrane surface of the endoplasmic reticulum, 
where most of the HCV proteins appear to be located (21, 30). 
In this model, NS4A would bring the NS3 proteinase in close 
proximity to its substrate, allowing efficient processing, partic- 
ularly at the NS4B/5A junction. Sequence analysis of NS4A 
reveals that the first half of the molecule is very hydrophobic 
and has the potential to span the lipid bilayer once, whereas 
the other half of the molecule is very hydrophilic (5 of 10 
carboxy-terminal amino acids are acidic) and may extend into 
the cytoplasm to interact with the more hydrophilic NS3. In 
agreement with this idea, Hijikata and coworkers (21) have 
recently shown that membrane association of NS3 translated in 
vitro in the presence of microsomal membranes requires the 
presence of NS4A. It should be noted that this kind of 
interaction may contribute not only to efficient polyprotein 
processing but also to the formation of a putative membrane- 
associated replication complex (postulated by analogy to flavi- 
viruses [7]) involving the nucleoside triphosphatase and heli- 
casc activities residing in the carboxy-terminal domain of NS3 
(34). 

In an attempt to narrow down the minimal NS3 domain 
required for full proteolytic activity, we found that the first 211 
amino acids of NS3 suffice for cleavage at all trans sites. 
Further amino-terminal deletions selectively abolished pro- 
cessing between NS4B and NS5A while cleavage at the other 
trans sites was not affected. Interestingly, the same phenotype, 
the loss of cleavage ability at the NS4B/5A site, was caused by 
the lack of NS4A protein. On the basis of these results, one 
might postulate specific interactions between the NS3 amino 
terminus and the NS4A protein. However, the existence and 
nature of such interactions remain to be determined. 

For many viruses, proteolytic processing mediated by virus- 
encoded proteinases plays an important role for virus replica- 
tion. Recently, it was shown that mutational ablation of the 
NS3-encoded proteinase of yellow fever virus severely reduced 
virus replication (8). Thus, it seems likely that the analogous 
proteinase of HCV plays an equally important role for the viral 
life cycle. Although we are still far away from a complete 
understanding of the pathways governing HCV polyprotein 
processing, our results are a first step in this direction and may 
provide a basis for the rational design of an antiviral drug. 
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The N-terminal domain of the hepatitis C virus 
(HCV) polyprotein containing the NS3 protease (res- 
idues 1027 to 1206) was expressed in Escherichia coli 
as a soluble protein under the control of the T7 pro- 
moter. The enzyme has been purified to homogeneity 
th cation exchange (SP-Sepharose HR) and hepa- 
rin affinity chromatography in the absence of &ny 
detergent. The purified enzyme preparation was sol- 
uble and remained stable in solution for several 
weeks at 4°C. The proteolytic activity of the purified 
enzyme was examined, also in the absence of deter- 
gents, using a peptide mimicking the NS4A/4B cleav 
age site of the HCV polyprotein. Hydrolysis of this 
substrate at the expected Cys-Ala scissile bond was 
catalyzed by the recombinant protease with a 
pseud o second-order rate constant (A^/Km) of 205 
and 19G,000 M**' s~\ respectively, in the absence and 
presence of a central hydrophobic region (sequence 
represented by residues 21 to 34) of the NS4A pro- 
tein. The rate constant in the presence of NS4A pep- 
tide cofactor was two orders of magnitude greater 
than reported previously for the NS3 protease do- 
main. A significantly higher activity of the NS3 pro 
tease-NS4A cofactor complex was also observed 
With a substrate mimicking the NS4B/5A site (k c JK M 
of 5180 ± 670 JVT 1 s" 1 ). Finally, the optimal formation 
of a complex between the NS3 protease domain and 
the cofactor NS4A was critical for the high proteo- 
lytic activity observed. 18»9 Academic Press 
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Human hepatitis C virus (HCV) Z is the major etlo- 
logic agent of post transfusion non-A, non-B hepatitis 
(1,2). Chronic infection with HCV has also been linked 
to the. development of liver cirrhosis and of hepatocel- 
lular carcinoma (3). Thus far, no efficient therapy ex- 
ists and there Is an urgent need for the development of 
HCV-specific antiviral therapeutics. 

Along with flaviviruses and pestiviruses, HCV is a 
member of the fla viviridae family. These viruses share 
similarities in their genomic and polyprotein organiza- 
tion. HCV contains a positive-sense linear RNA ge- 
nome of 9.5 kb with a single open reading frame en- 
coding a polyprotein of 3010 to 3033 amino acids (4-7). 
This polyprotein encodes at least 9 different proteins 
as follows: 5'-C-El-EZ^S2-NS3-NS4A-NS4B-NS5A- 
NS5B-3* (where E denotes envelope proteins and NS 
denotes nonstructural proteins) and is proteolytically 
processed in the cytoplasm and/or in the endoplasmic 
reticulum. Both viral- and host-encoded proteases are 
involved in the production of mature viral proteins 
(8-10); First, the host signal peptidase appears to be 
responsible for cleavages in the structural NS2 region, 
generating C, El, E2, and possibly NS2; second, the 
NS2/3 junction is cleaved by a putative HCV-encoded 
metalloprotease residing in the NS2 region; and fi- 
nally, a second viral protease, essential for processing 
most of the NS proteins, is located in NS3. It has been 
shown that the N-terminal domain of NS3 encodes a 
serine protease that is necessary but not sufficient for 
efficient cleavages downstream of NS3, i.e., at the NS3/ 
4 A, NS4A/4B. NS4B/5A, and NS5A/5B junctions. The 
NS4A is an amphipathic protein of 54 amino acids, and 

# 

1 Abbreviations used: HCV. hepatitis C virus; DTT. dl.thtothrcltol; 
DMSO, dimethyl sulfoxide; IPTG. lsopropyl-l>{-)-thlogalai:topyr- 
anoside; Nph. pa;»nitropheny lalanine; NS, nonstructural. 
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1* has a hydrophobic N-terminal domain followed by a 
frophillc C-terminal domain* The NS4A acts as a 
cofactor of the NS3 protease activity for an efficient 
cleavage of NS3/4A. NS4A/4B, NS4B/5A, and NS5A/5B 
sites and it interacts with the NS3 pr otease in trans. 
Recently, it has been suggested that the central region 
encompassing residues 21-34 of NS4A mimics the 
NS4A in its activation of NS3 in vitro (11-19). How- 
ever, little Is known about the mechanism by which the 
Intact NS4A regulates proteolytic activity of the NS3 
protease. 

The 20-kDa N-terminal domain of the NS3 protein is 
capable of catalyzing hydrolysis of the cleavage site 
downstream from NS3 with the same efficiency as the 
full-length NS3 protein; it also retains its ability to 
interact with NS4 A (1 1). Recently, several groups have 
reported purification of full -length NS3 protease as a 
histidine fusion protein (20) and as a maltose binding 
protein (MBP) fusion protein (15,21) from Escherichia 
coli. The amino terminal domain of NS3 protease by 
Itself or as a fusion protein has also been purified from 
E. coli (22-24) and baculovirus (25,26) In the presence 
of detergents. These enzyme preparations exhibited 
very low activity for all peptide, substrates examined 
even in the presence of NS4A cofactor (15,22,25,27,28). 

In this report, we describe a unique and rapid two- 
p procedure for the purification of large quantities of 
trie protease domain as a 1 9-kDa recombinant protein 
of NS3 protease from E. coli in the complete absence of 
detergents. Purification, substrate cleavage assays, 
and screening of compounds as inhibitors for HCV 
protease are simpler without Interference from deter- 
gents. We further show that complex formation be- 
tween the protease and the NS4A cofactor is important 
under the assay conditions for the dramatic activation 
of the enzyme by the 4A cofactor peptide to exhibit 
cleavage efficiency almost 1000-fold greater than the 
uncomplexed form, and at least 130-fold greater than 
those reported previously with peptide substrates. 

MATERIALS AND METHODS 

Expression of the HCV NS3 protease. The DNA en- 
coding amino acids 1027-1206 (21) of the BK strain 
HCV polypeptide was 1 cloned downstream of the T7-7 
vector, in frame with the first ATG of the protein of 
gene 10 of the T7 phage, to yield the plasmid pT7- 
7(NS3 !O27 .. 12O0 ). This plasmid was transformed into E. 
coli BL21DE3 plysS cells (Novagen) utilizing heat 
shock techniques. Cells were grown at 3.7°C In LB 
—^dium containing 50 /ig/ml ampicillin to an optical 
isity of 0.4 -0.6 at 600 nm, and the temperature was 
lowered to 25 C C to allow for induction with 400 ja-M 
isopropyl-D-(-)4hiogalactopyranoside (IPTG; Boeh- 
ringer Manneheim). The bacterial cells were grown 



further for two additional hours and then pelleted with 
centrifugation for storage at -80°C. 

Purification of HCVNS3 protease. Cells from a 10- 
liter culture were resuspended at 4°C in 100 ml of lysis 
buffer (25 mM sodium phosphate. pH 7.5, 1 mM EDTA, 
10% glycerol, 5 mM DTT) and treated for 30 min with 
0.02 mg/rnl DNase (Type IIS: bovine pancreas, Sigma) 
in 20 mM MgCl 2 . PMSF (1 mM) was added to the cell 
suspension and the cells were immediately disrupted 
with its passage six times through a Microfluidizer 
(Model 110-S) at 6 bar pressure. The cell lysate was 
centrifuged at 10,000 rpm for 30 min, and the super- 
natant was loaded at 2.5 ml/min onto a Hi-Load SP 
Sepharose higlvperfor mance column (26/10: Pharma- 
cia Biotech) which had been preequilibrated in 50 mM 
sodium phosphate (pH 6.5), 10% glycerol, 1 mM EDTA, 
5 mM DTT, The enzyme was eluted from the column in 
a 0-1 M NaCl gradient. Fractions collected were ana- 
lyzed with sodium dodexyl sulfate-polyacrylamlde gel 
electrophoresis (SDS-PAGE) and those containing pro- 
teins with molecular mass similar to the NS3 protease 
were pooled and diluted 8- to 10-fold into a buffer 
containing 25 mM sodium phosphate (pH 7.5), 10% 
glycerol, 5 mM DTT buffer, and loaded at 3 ml/mln onto 
four columns of 5 ml Hi-trap heparin (Pharmacia-Bio- 
tech) connected in series without additional tubing. 
The enzyme was then eluted with. a 0-1 M NaCl gra- 
dient. Fractions were analyzed by SDS-PAGE and 
with the peptide cleavage assays, Protein fractions 
demonstrating the NS3 enzymatic activity and show- 
ing greater than 99% purity on the SDS-PAGE were 
pooled and stored at -80°C in the elutlon buffer. The 
N-terminal sequence analysis was carried out by Ed- 
man degradation on an Applied Biosystems Model 
494A protein sequencer. Protein concentrations were 
determined with quantitative amino acid analyses by 
using a postcolumn ninhyclrin derealization method 
on a Beckman 6300 analyzer. 

Substrate cleavage assay. The peptides 7-metlioxy- 
coumarin-4-acetyl-DEMEECASHLPYK-(e-NHCOCHO 
and acetyl-DEMEECASHLPYK-(e-NHCOCH 3 ), mim- 
icking the NS4A/4B cleavage site of the HCV polypro- 
teln, were custom synthesized for us by Enzyme Sys- 
tems Products (Dublin, CA) and were >95% pure. 
The NS4B/5A substrate 7-methoxycoumarin-4-acetyl- 
EDASTPCSGS-Nph-L (where Nph Is /jara-nitrophenyl- 
alanine) was purchased from Bachem Biosciences. The 
NS4A peptide with the sequence of GSVVIVGRIILS- 
GRKK [4Apep(21-34)KKJ was also synthesized by En- 
zyme System Products. Peptide hydrolytic assays cat- 
alyzed by the NS3 protease were performed at 25°C in 
a circulation water bath in a buffer containing 100 fil of 
50 mM Hepes (pH 7.5) and 10 mM DTT with 50% 
glycerol. The reaction was quenched with an aliquot of 
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,o 0 ftl of 5% phosphoric acid, and the mixture was 
ilyzed with reverse-phase high-performance liquid 
chromatography (HPLC) on a 4.6 X 50-rnni Vydac C18 
column. The cleavage products were separated using a 
0.1% trifluoroacetic acid/acetonitrile gradient and 
identified with comparisons of retention times against 
defined peptide products. Absorbance of the eluent was 
monitored at 220 nrn using a Waters 996 photodiode 
array detector or a Waters 470 fluorescence detector. 
Excitation wavelength was set at 328 nm and the flu- 
orescence emission was monitored at 393 nm. The en- 
zyme concentrations used in the assays varied from 2 
to 20 nM in the presence of 4Apep(21 -34) KK and 500- 
1000 nM in the absence of the 4Apep(21-34)KK pep- 
tide. In the assays in which the 4Apep(21*-34)KK was 
present, the enzyme was preincubated with 4Apep(21- 
34) KK for 10 min at 4°C, followed by 5 min at room 
temperature at a 20* to 40-fold greater concentration, 
before addition to the assay reaction mixture. For pre- 
incubation of enzyme with the 4A peptide, the enzyme 
was added to the solution already containing the 4 A 
peptide (due to instability of NS3 by Itself at the pre- 
incubation concentrations). The NS4A peptide and 
substrate concentrations used ranged from 0;1 /xM to 
50 fM and 0.25 /jlM to 100 jaM, respectively. All sub- 
Urates were dissolved in 50 mM Hepes (pH 7.5), 30 
vl DTf and 10% glycerol. The assay was typically 
conducted for a period of 5 min in the absence of 4A 
peptides, but for less than 5 min when 4Apep(2l- 
34)KK was present. The steady-state kinetic parame- 
ters (/c cm and /Cm) were determined by fitting the initial- 
velocity versus substrate-concentration data to the 
Michaelis-Menten equation. Initial velocity and 
steady-state conditions were strictly maintained for all 
reaction assays performed. 

Stability of NS3 protease domain 4Apep(21-34)KK 
complex. The complex was formed by mixing the NS3 
protease domain (20 /uM stock solution) and 4Apep(21- 
34)KK (I mM stock) to a final concentration of 100 nM 
and 25 juM, respectively, in 50 mM Hepes (pH7.5), 10 
mM DTT t and 50?4 glycerol and incubated at 25°C. 
Activity was monitored as a function of time over 24 h. 

RESULTS AND DISCUSSION 

Expression and purification. The HCV protease 
was found to be localized in inclusion bodies when the 
plasmid DNA encoding the HCV protease was trans- 
formed into BL21DE3 pLysS E toll cells and induced 
with IPTG at 37°C,. However, lowering the Induction 
mperature to 25°C immediately following addition of 
.'TG and allowing the cells to grow for only 2 h re- 
sulted in the accumulation of HCV protease in the 
soluble fractions prepared from the host cells. Treat- 
ment with DNase prior to cell disruption rendered the 
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FIG. 1. Reduced SDS-PAGE (16%) analysis of the NS3 protease 
domain of human hepatitis C virus. The gel was stained with Coo- 
rnassic brilliant blue. Lanr. 1. molecular weight markers. Lnne 2. 
crude bacterial lysatc. Lane 3, supernatant of crude lysate. Lane 4. 
pellet from crude lysate. l^ann 5, pooled SP Sepharose MP fractions. 
Lane 6, pooled Hi- trap heparin fractions. 

solution less viscous upon cell lysis but it had no effect 
on solubility of the recombinant NS3 protease. We used 
a rapid two-step purification scheme for the purifica- 
tion of the protease. The supernatant obtained subse- 
quent to centrifugation of cellular particulate was frac- 
tionated by SP Sepharose HP and heparin affinity 
chromatography and the eluent was analyzed by SDS- 
PAGE. A significant purification (>60% purity) was 
obtained with the SP Sepharose HP column. The sec- 
ond step of the Hi-trap heparin column yielded further 
purity to homogeneity (>99?6) as shown in Fig 1. The 
two-step procedure described here was rapid and re- 
sulted in a homogenous preperatlon of NS3 protease 
without, using any detergent during the whole purifi- 
cation procedure. 

The purification of NS3 protease is summarized in 
Table 1 . As judged with the enzymatic specific activity, 
a 3100- and 40,700-fold increase in activity was ob- 
tained with the SP Sepharose HP and Hi-trap-heparin 
columns, respectively. The initial activity observed 
may be low because the enzyme may be inhibited in the 
crude supernatant. The final yield of pure enzyme was 
1 mg per liter of E. coli cell culture. Amino acid se- 
quence analysis of the purified enzyme revealed thai 
the N-terminal of the purified NS3 protease to be P-I- 
T-A-. . . instead of the expected M-A-P-I-T-A-. . . as de- 
duced (6) from the cDNA sequence. Mass spectrometry 
data also confirmed this result — a molecular mass of 
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TABLE 1 

Purification of HCV NS3 Protease Domain 



Fraction Protein (mfi/ml) Total protein (mg) Specific activity (nmol • min' 1 mg" 1 ) Fold purification 
Crude supernatant 1 1-5 1380 0.001" 

HR Sepharose 1.7 47.6 3.10 3100 

Heparin 0.3 1U 40.7 40,700 

* This value is only a rough estimate of the upper limit of specific activity. For this reason, yield of purified NS3 protease is not included 
in this table. 



18,867.8 (predicted average mass 18,868.6) was found 
for the purified enzyme. This value is In agreement 
with a NS3 protease domain containing amino acid 
residues from 1029 to 1206 without the encoded N- 
termlnal Met and Ala residues as shown by N-terminal 
sequence analysis. Thus, the recombinant NS3 isolated 
from E. coll In these and other (22) studies had deletion 
of the first two N-terminal amino acids. The underlying 
mechanism of the cleavage is not clear. 

In the past, several groups have reported purifica- 
tion of either the full-length or the N-termlnal protease 
domain of NS3 — from various sources such as B\ coli, 
baculovirus, and mammalian host cells — and have em- 
phasized (20,22-27.29) the need of adding various de- 
gents to solubilize the protease in their purification 
procedures. In the current protocol, however, detergent 
is not necessary for the enzyme to be soluble through- 
out the course of the purification. Furthermore, the 
enzyme could be easily concentrated to 4-5 mg/ml and 
stored at 4°C for weeks without any apparent loss in 
activity. 

Enzyme activity. To characterize the enzyme purl- 
fled without detergent, first we developed a peptide 
cleavage assay using as substrate two peptides, both 
corresponding to the P6-P'6 residues of the NS4A/4B 
cleavage site. To one (Ac-DEMEECASHLPYK-(e-NH- 
COCH 3 ); peptide I), a lysine was introduced to the 
C-terminus to render it more soluble at higher concen- 
trations. To the other (7-methoxycouniarln-4-acetyI- 
DEMEECASHLPYK-^-NHCOCHj); peptide II), a cou- 
marin flunrophore was introduced at. the N-terminus to 
enhance detection of the cleaved product. 

The purified NS3 protease catalyzes the hydrolysis of 
both NS4 A/4B substrates at the expected Cys-Ala scis- 
sile bond. In the absence of glycerol or at low glycerol 
concentrations, the rate of hydrolysis of the NS4A/4B 
substrate as catalyzed by the protease was very low but 
could be detected at high concentrations (high nM to 

v fiM) of the enzyme. However, at 50% glycerol con- 
-^ntration the pseudo second-order rate constant for 
the hydrolysis of peptide II was determined to be 205 ± 
20 M s~\ this value was comparable to the reported 
value of 50 to 104 tvT'.s" 1 (21,24,26) for peptides mim- 



icking the same cleavage site. Shimizu etaJ. (15) have 
reported detection of no enzymatic activity with a full- 
length MBP-NS3 protease on peptide substrates rep- 
resenting the NS4A/4B cleavage site substrate and 
only a marginal second-order rate constant of 6 M" 1 s" 1 
for a peptide (GDDIVPCSMSYTWT) representing the 
NS5A/5B cleavage site. Recently k w /K M values ranging 
from 60 to 700 M" 1 s~' for NS3 protease domain and 
MBP-NS3 full-length protease were reported (28,32). 
for the NS5A/NS5B cleavage site. 

Preformed complex ofNS3 protease and NS4A peptide. 
It has been reported that the hydrophobic central core of 
the NS4A protein activates the NS3 protease activity in 
peptide cleavage assays (15,22,25,27,28) .The effect of the 
1 4-amino-acid peptide corresponding to residues 21 to 34 
of the NS4A protein (4Apep2i-34) on the activity of the 
NS3 protease has also been Investigated with our deter- 
gent-free preparation of NS3. In this study, the 
4Apep21-34 peptide was modified with the addition of 
two lysine residues to its C-terminus. The resultant pep- 
tide, 4Apep21-34KK, showed enhanced solubility. It was 
crucial for efficient activation by the 4Apep(21-34)KK to 
preform the complex at high concentration as described 
under Materials and Methods. For determining the ki- 
netic parameters of the complex containing NS3 and a 
NS4A peptide. It was Important that high concentrations 
of both the protease and activator were maintained until 
the complex was ready for assays so as not to allow 
dissociation of the components during the course of the 
reaction and that steady-state kinetic conditions were 
strictly observed. 

For the determination of initial velocity, the enzy- 
matic reaction was first followed for a period of 20 min 
at the lowest substrate concentration desired for sub- 
strate saturation. Formation of product as a function of 
time was found to be linear for a length of only 4 min 
with less than 10% substrate hydrolyzed (data not 
shown). Thus, all initial velocity data were obtained 
(see Materials and Methods) for reactions conducted 
for a period of less than 4 min. Figure 2 shows the 
initial velocity for peptide hydrolysis, catalyzed by the 
detergent-free preparation of NS3, as a function of 
substrate concentration for peptide II in the presence 
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FIG. 2. Substrate saturation and Initial velocity of peptide hydro- 
lysis as catalyzed by the NS3 protease under steady-state kinetic 
conditions (see text). The enzyme reaction was carried out In 50 mM 
Hepes (pH 7.5), 10 mM DTT. and 50% glycerol in the presence of 
0.25-5 jaM peptide II (see Results), 2 nM NS3, and 1.25 jiM 
4Apep{21-34)KK, The reaction was allowed to continue for 2.5 min. 
within which less than 10% of the substrate (at the lowest concen* 
tration assayed) was hydro ly zed, The data points shown were aver- 
ages of 5 Independent measurements at each substrate concentra- 
tion. The SE (<5%) were smaller than the symbols used In the figure. 



w. 4Apep(21 -34)KK. The observed /c (a[ and A' M values 
were 9 min" 1 and 0.76 jjM, respectively, resulting In a 
pseudo second-order rate constant {k ( JK M ) of 196,000 
M " 1 s~ l an almost 1000-fold activation by the NS4A 
peptide, and at least 130-fold greater than those re- 
ported for a similar substrate (22,25,27). 

To rule out the possibility that kinetic parameters 
obtained for the detergent -free NS3 for peptide hydro- 
lysis were In any way affected by the coumarin moiety 
at the N-terrninus of the substrate, peptide I was used 



as a control substrate. In identical assays substituting 
peptide I for peptide II, the A Cflt and K M values observed 
were 8.6 min and 2.8 jxM, respectively, yielding a 
kJK\A value of 5 1,000 M s" 1 which was significantly 
greater than the reported values of 90 to 1600 M' 1 s" 1 
for the same cleavage site substrates (22,25,27,30,31). 
Thus, the high pseudo second-order rate constant ob- 
served for catalysis by the NS3 protease was not lim- 
ited to peptides containing a coumarin moiety. To fur- 
ther show that the detergent-free enzyme in general 
forms a very active complex with the NS4A cofactor 
and that Its high catalytic efficiency was independent 
of substrate, a peptide mimicking the NS4B/5A cleav- 
age site, TTmethoxycoumarin^-acetyl-EDASTPCSGS 
Nph-L, was used as a substrate. Values of 13 min" 1 and 



5180 M s" 1 were obtained for k at and kJK u . The 
latter may be compared to the reported values of 80 
M 1 s" 1 for a similar substrate (22). In presence of 
NS4A peptide (20-mer) A raI //C M values ranging from 
2670 to 20,000 M" 1 s" 1 were recently reported for the 
NS5A/5B cleavage site (28,30,31,32). In our hands 
NS5A/5B substrates were not sufficiently stable to al- 
low us to accurately determine the kinetic parameters. 
The steady-state kinetic parameters obtained in this 
study with the detergent-free NS3 protease are sum- 
marized in Table 2. 

Stabilization ofNS3 pivtease domain by 4A pep (21- 
34)KK cofactor. It is possible that the NS4A protein 
may stabilize the NS3 protease in vivoby protecting it 
from degradation by cellular enzymes (33). We exam- 
ined the stability of the NS3 protease domain in the 
presence of NS4A peptide (21-~34)KK peptide. The en- 
zyme (100 hM) was incubated at 25°C in the presence 
and absence of 4 A pep(21-34)KK (25 fiM) and activity 
was followed for change in activity with increasing 
incubation time as shown In Fig. 3. While there was no 



TABLE 2 

Kinetic Parameters of HCV NS3 Protease Domain with Peptide Substrates' 



Peptide substrate 



k t * (min" 1 ) 



(!) NS4A-NS4B site mimicking peptides 
Mca-DEMEEC t ASHLPYl<-U-NHCOCH 5 ) 5 
Mca-DEMEEC* ASHLPYK-U-NHCOCH 3 ) * -i 4APep(2l-34)KK 
Ac-DEMEEC* ASHLPYK-(c-NHCOCH 5 ) d + 4APep(21-34)KK 

01) NS4B-NS5A site mimicking peptide 
Mca-EDASTPC*SGSNphL r + 4APepf21-34)KK 



0.36 ± 0.07 
9.0 ± 0.02 
3.6 ± 0.3 

13.6 i 0.7 



29.2 ± 3.4 
076 ± 0.02 
2.8 X 0.3 

43.6 ± 7.9 



205 s 20 
196.000 + 9,000 
51.0001- 3,000 

5180 ± 670 



" Kinetic pa mm liters were determined as described under Materials and Methods tn 50 mM Hepes (pH 7.5), 10 mM DTT and 50% glycerol. 
t.a are moan values from at least three independent experiments. 
Mca, 7 -methnxycoumarin-4 -acetyl. 
'The assay was performed with 2 nM tsnzyme and 1.25 /iM 4Apep(21-34)KK in 50 mM Hepes (pH 7.5), 10 mM DTT. and 50% glycerol 
buffer. 

"The assay was carried out with 10 nM enzyme and 5 p,M 4Apep(2J~34) KK in 50 mM Hepes (pH 7.5), 10 mM DTT. and 50% glycerol buffer. 
• The assay was conducted with 5 nM enzyme and 10 i*M 4Apep(2 1 -34)KK In 50 mM Hepes (pH 7.5). 10 mM DTT, and 50% glycerol buffer. 
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FIG. 3, Activity of NS3 protease (O) and NS3 pror.e.asp7Mpcp(21- 
34)KK complex {•) as a function of rime. NS3 protease and NS3 
protcaseMApep(2l-34)KK complex was incubated In 50 mM Hepes 
(pH 7.5), 10 mM DTT, and 50% glycerol at 25'C and activity was 
determined by substrate cleavage assay at various time intervals. 
Activity at time zero was taken as 100% activity. 

loss in activity observed In the presence of NS4A pep- 
tide, only 20% of the NS3 protease activity remained 
after 24 h at 25°C, In Incubations without the NS4A 
peptide. Even lower concentrations of the enzyme (2 
nM) showed no detectable loss of activity in presence of 
4 A pep(21-34)KK for at least 24 h at 25°C (data not 

<wn)' 

Complex of the NS3 protease and 4 A cofactor. Our 
kinetic data obtained for the NS3 protease in the pres- 
ence of the NS4A cofactor suggest that the affinity of 
NS4A peptide to the NS3 protease domain may be very 
different under our assay conditions from those re- 
ported previously. It has been reported that maximal 
activity is achieved at 1:1 molar ratio of NS3 protease 
domain and the NS4A peptide (22). In the absence of 
detergents, a 10- to 14-fold excess of NS4A peptide has 
been used to attain a maximal activation of the MBP- 
NS3 fusion protein {k L JK M ~ 250 M" 1 s" 1 ) and the 
protease domain {k c JK M 2670 M" 1 s" 1 ) in catalysis of 
hydrolysis of NS5A/5B cleavage-site-mimicking sub- 
strates (15,28), In the present study, we have demon- 
strated that under detergent-free conditions the NS3 
protease domain interacts with the NS4A peptide to 
yield a significantly more active catalytic complex, al- 
though at a significantly higher NS4Apep(21~34)KK to 
NS3 ratio of 625, due to a moderate K A of ^20 juM of the 
cofactor to achieve complete saturation of the protease. 
We note that the values of the rate constants (A' cor , 
KJK U] shown in Table 2 are likely underestimates 
because they are not obtained in the presence of theo- 

ical saturating amounts of 4Apep(2l-34)KK. In 20% 
^.ycerol-containing buffers, the affinity of NS4A pep- 
tide to NS3 protease domain is weaker than that in 
solution containing 50% glycerol (37), and complete 
saturation is not attainable due to the limited solubil- 



ity of the NS4A peptide (150 jxM in assay buffer). The 
high activity of the complex correlates well the recently 
published three-dimensional structures of the com- 
plexes and uncornplexed form of the NS3 protease do- 
main (34-36). In the presence of the 4A peptide the 
NS3 protease catalytic triad forms a chymotrypsin-like 
fold, similar to other serine proteases, with the optimal 
orientation of the active site Asp, His, and Ser resi- 
dues, which is not properly formed in the uncornplexed 
form (35). Also the stabilization effect of 4 A peptide is 
evident by the interactions observed With the N-terml- 
ntis of the protease. The 4 A peptide beta sheet becomes 
part of the N-terminal domain beta sheet of the pro- 
tease and stabilizes the structure, We and others 
(22,37) have found that 4A binding as well as the 
proteolytic activity of the complex increases with in- 
creasing concentrations of glycerol, with maximal ac- 
tivity at ^50% glycerol. In vivo, the interaction be- 
tween NS4 A and NS3 is likely to be substantially more 
complicated since the helicase domain (29^38 — 42) , en- 
coded downstream and In frame with the protease as a 
single NS3 polypeptide, is likely to be part of an inte- 
gral protease- hellcase-actlvator multienzyme com- 
plex and plays a critical role in regulating virus repli- 
cation. 

CONCLUSION 

A rapid purification procedure in the absence of de- 
tergents has been devised for the NS3 protease domain 
(residues 1027 to 1206) of the human hepatitis C virus, 
and an efficient in vitro assay is described to demon- 
strate that detergents are not required for the purifi- 
cation, solubility, stability, and for the optimal activity 
of the recombinant NS3 protease. The activity of the 
HCV NS3 protease Is comparable to those reported for 
other viral proteases such as the HIV-1 protease (43), 
adenovirus protease (44), cytomegalovirus protease 
(45.46), and herpes simplex type-1 protease (47). The 
highly active noncovalent complex of the N-terminus 
domain of NS3 protease and NS4A peptide along with 
the more stable NS4A/4B cleavage site-mimicking pep- 
tide substrate will be useful tools for screening inhibi- 
tors as therapeutic leads for HCV therapy. 
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Abstract The serine protease domain of HCV comprising 
amino acids 1027-1218 (ANS3) was expressed in E. coli with a 
His tag at its N-terminal end. The protease was purified to 
apparent homogeneity by a single step affinity chromatography 
resulting in high yields ( ~ 3 mg/1 of cultured cells). The ANS3 
efficiently cleaves a 17-mer peptide corresponding to the NS5A- 
NS5B junction with k c JK m - 160 X 10"* min" f uM" 1 in the 
presence of NS4A peptide. Our ANS3 represents the minimal 
domain possessing highly active protease of NS3 constructed so 
far. The ANS3 protein also efficiently processed a longer 
substrate corresponding to NS5A/5B junction (2203-2506 amino 
acids) that was synthesized by in vitro transcription and 
translation system. 

Key words: Hepatitis C virus; Serine protease; 
Non-structural protein 3 



1. Introduction 

Hepatitis C virus (HCV) is the major etiological agent of 
post-transfusion non-A, non-B hepatitis, is an enveloped virus 
containing a single-stranded RNA genome of approximately 
9.5 kb nucleotides [1,2]. A single polyprotein of 3010-3030 
amino acids is translated from this genome [3] in the order 
of NH2-C-El-E2-p7-NS2-NS3-NS4A-NS4B-NS5A-NS5B- 
COOH (Fig. 1A) [4,5]. This polyprotein is subsequently proc- 
essed by a combination of host and viral proteases to produce 
at least 10 viral proteins. The core protein (C) and envelope 
proteins (El and E2) are structural proteins, and NSs are non- 
structural proteins [6,7]. 

Previous studies indicate that a host signal peptidase local- 
ized in endoplasmic reticulum (ER) catalyzes polyprotein 
cleavages in the structural region (C/El, E1/E2, E2/p7 and 
p7/NS2) [8,5], whereas a HCV encoded serine protease located 
in the N-terminal one- third of the NS3 protein is responsible 
for cleavages at four sites (3/4A, 4A/4B, 4B/5A and 5A/5B) 
(Fig. 1A) [9-14] in the NS region. Using a transient coexpres- 
sion system it has been shown that proteolytic cleavages in the 
non-stmctural protein region (NS3 to NS5B) of HCV poly- 
protein are effected by two viral proteins, NS3 and NS4A [15- 
19]. The N-terminal 180 amino acid region of NS3 includes 
sequences showing homology with the active sites of serine 
proteases [20-22]. Histidine 1083, aspartate 1107 and serine 
1165 (numbers are according to their locations in the poly- 
protein of HCV subtype J (HCV-Ib) [23,24]) found in this 
domain have been proposed to constitute the catalytic triad 
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of the NS3 protease similar to other serine proteases belong- 
ing to the chymotrypsin family. 

NS4A is shown to be the NS3 protease cofactor or effector 
enhancing cleavage efficiency at various sites NS3/4A, NS4A7 
4B, NS4B/5A and NS5A/5B [16,19,25-27]. NS4A is an am- 
phipathic protein of 54 amino acids and it has a very hydro- 
phobic N-terminal domain followed by a hydrophilic C-ter- 
minal domain [16], Using an in vitro reconstituted assay 
system it has been shown that the residues 22-31 of NS4A 
constitute the putative core sequence for NS4A's effector ac- 
tivity [27]. The mechanism by which NS4A facilitates cleavage 
remains obscure. Truncation experiments have mapped the 
N-terminus of NS3 as the domain responsible for interaction 
with NS4A [25,27]. 

Since the NS3 protein is very important for releasing func- 
tional proteins from the polyprotein, it is currently being tar- 
geted in the development of drugs and diagnostics. As a mat- 
ter of fact, several known serine protease inhibitors have been 
tested for their action on NS3 activity in vitro and it was 
found that millimolar concentrations of these compounds 
were required to show moderate inhibitory effect on NS3 pro- 
tease activity [3,28]. One major obstacle for these well known 
inhibitors may be the presence of helicase domain also or 
alternatively NS3 protease may be structurally different in 
comparison with other serine protease domains. 

In order to carry out detailed characterization of this en- 
zyme in terms of its substrate specificity, kinetics, sensitivity to 
inhibitors and for structural studies, a reproducible and con- 
venient large-scale purification of the enzymatically active 
protease domain of NS3 is essential In this report we show 
that the region encompassing amino acids 1027-1218 tagged 
with the His tail (6 His) has greater activity than other ex- 
pression constructs made so far and the purification procedure 
is much simpler. We also show that the purified protein pos- 
sesses the ability to interact with NS4A resulting in increased 
proteolytic activity. 

2. Materials and methods 

2.1. Construction of expression plasmid containing the HCV 
serine protease domain 
To construct the expression plasmid pHisBANS3, a cDNA frag- 
ment encoding amino acid residues 1027-1218 in the HCV polypro- 
tein was obtained by PCR using appropriate oligonucleotides (1: 
d(CCGCTGCAGCCATGGCGCCTATCACGGCCTAT), 2: d(CC- 
GAAGCTTTCAGGCCGGAGGGGATGAGTT)), which insert a 
Psil site at the 5' -end and a TAG (stop) codon and Hin6\U site at 
the 3 '-end of the sequence. We used plasmid pMANS34NSH [29] as a 
template to amplify amino acid residues 1027-1218 in the HCV poly- 
protein. The PCR product amplified using Ex. Taq (Takara) was first 
cloned into pCR II vector (Invitrogen) according to the manufactur- 
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Fig. 1. Schematic representation of HCV serine proteinase domain 
expressed in E. coli. A: The HCV genome and a translated product 
with names of the processed proteins. The filled arrow shows the 
cleavage site of HCV serine protease, whereas the dotted arrow in- 
dicates the cleavage site of HCV metaloprotease. B: ANS3. Closed 
and open boxes indicate the 6 histidine tag and the protease do- 
main, respectively. C: N-terminal extra amino acid sequence and 
NS3 protease domain. 



cr's instructions and digested with Pstl and Hindlll to release the 
fragment encompassing amino acids 1027-1218 of HCV polyprotein. 
It was subsequently cloned into expression plasmid pTrcHisB (Invi- 
trogen) which was also digested with Pstl and Hindlll. The resulting 
plasmid pHisBANS3 (Fig. 1B,C) encodes the protease domain of NS3 
(192 amino acids) with a N-terminal 40 non-virus encoded amino 
acids possessing a consecutive stretch of 6 His residues that allows 
fusion protein to be purified in a single step by metal chelating affinity 
chromatography. The cloned DNA fragment was sequenced in order 
to exclude the introduction of mutations by PCR and also to confirm 
the in-frameness of the insert. 



23, HCV NS3 protease assays 

To characterize the enzymatic activity of the purified protease 
(ANS3) we investigated its ability to cleave a synthetic peptide, S-l 
(Dns-GIy-Glu-Ala-Gly-Asp-Asp-Ile-Val-Pro-Cys-A-Ser-Met-Ser-Tyr- 
Thr-Trp-Thr-COOH, A; cleavage site) corresponding to the cleavage 
site of the NS5A/5B. Protease activity of ANS3 and maltose binding 
NS3 fusion protein, MBP-NS3 (amino acids 985-1647) [29] was ana- 
lyzed either in the presence or absence of NS4A peptide, P41 (amino 
acids 1673-1692 of HCV polyprotein). 

Kinetic constants were determined from enzyme assays with the 
synthetic substrate concentration ranging from 25 to 960 u\M. The 
K m and values were determined by Lineweaver-Burk plots. Ki- 
netic reactions were analyzed either in the presence or absence of P41. 
For evaluating the rate of reaction in the absence of P41, the ANS3 or 
MBP-NS3 (0.72 mM) was incubated with various concentrations of 
substrate in a buffer (Tris-HCl (pH 7.8), 30 mM Nad 2 , 5 mM CaCl 2 , 
10 mM DTT) at 25°C for 60 min, whereas reactions carried out in the 
presence of P41 (10 mM) were initially equilibrated with ANS3 or 
MBP-NS3 for 15 min at 37°C in the above buffer. The reaction was 
initiated by addition of substrate at 37°C and data points were col- 
lected for 10 min. 

In order to test the ability of the ANS3 to cleave longer substrate 
representing NS5A/B junction protein in the amino acid sequence 
2203-2506 of HCV polyprotein which closely resembles the situation 
in vivo, radiolabelled NS5A/5B substrate was produced using a 
coupled transcription- translation system (TNT Promega) according 
to the manufacturer's protocol. The DNA encoding amino acids 
2203-2506 of polyprotein representing the NS5A/B site was amplified 
by PCR and was added to a 25 ul TNT reaction in the presence of 
[ 35 S]methionine (Amersham) at 300 uCi/ml and incubated at 30°C for 
1-2 h. For experiments to assess proteolytic cleavage of pre-formed 
substrate the TNT reaction mixture was diluted by adding an equal 
volume of buffer (100 mM HEPES (pH 7.6), 300 mM NaCl, 6 mM 
MgCl 2 , 20 mM DTT). To 6 pi (==300 cpm of translated product) of 
diluted TNT reaction mixture, ANS3 (750 nM) was added and incu- 
bated for 30 min at 30°C. In experiments to study the effect of NS4A 
peptide, the protease was pre-incubated with P41 for 5 min at 30°C 
before the addition of substrate. Samples were withdrawn at different 
time intervals and the reaction was stopped by adding SDS-PAGE 
sample buffer followed by denaturing at 95°C for 3 min. All samples 
were loaded on to SDS-PAGE and the radioactivity was quantitated 
by image analyzer (BAS 2000, Fuji Film). 



2.2. Expression and purification of HCV NS3 protease domain 

Following transformation of E coli HB101, cells harboring the 
vector pHisBANS3 were grown in LB medium containing 100 ug/ml 
ampicillin. When the absorbance reached a value of 0.5-0.6 OD^ 
isopropyM-thio-p-D-galactopyranoside (IPTG) was added to give a 
final concentration of 1 mM and the incubation was continued for 
an additional 2 h at 37°C. Under this induction condition a high level 
expression of ANS3 was observed and there was only a small amount 
present in the insoluble fraction. The cells were harvested by centrif- 
ugation and washed extensively with PBS (20 mM sodium phosphate; 
pH 7.4, 140 mM NaCl). The cell pellet was resuspended in lysis buffer 
(20 mM sodium phosphate; pH 6.3, 500 mM NaCl) and disrupted by 
sonication on ice using a Branson 200 sonifier (30 sX6 strokes at 18 
W output with 30 s intervals). The homogenate was centrifuged at 
30 000 Xg for 30 min to remove cell debris and was chromatographed 
on a nickel-agarose column (Qiagen). We applied a stepwise gradient 
of pH to elute the protease in order to overcome the precipitation of 
the enzyme. The column was washed extensively with several column 
volumes of lysis buffer and subsequently washed using the same buffer 
of pH 6.0 and finally with the same buffer of pH 5.0. Resin bound 
protein was eluted with sodium phosphate buffer (pH 4.0) containing 
500 mM NaCl. Eluted fractions were subjected to SDS-PAGE [30], 
protein containing fractions were pooled and concentrated by using 
Millipore Ultrafree Biomax 10K concentrator (Millipore). The en- 
zyme was stored in aliquots at — 20° C in the same buffer containing 
40% glycerol. 

Protein concentrations were estimated from UV absorbance at 
280 nm: an extinction coefficient of e = 20 8O0 M" 1 cm"" 1 was calcu- 
lated on the basis of primary sequence data according to published 
procedures [31] and concentration of ANS3 was determined according 
to the Lambert Beer law. Alternatively, ANS3 concentration was de- 
termined by Bradford (BioRad) assay using lysozyme as standard. 



3. Results and discussion 

The purified ANS3 protease migrated as a single band with 
a molecular mass consistent with that calculated from the 
primary sequence (molecular mass of 25 kDa as judged 
from SDS-PAGE) (Fig. 2). The final yield was -3 mg of 
purified protein per liter of cultured cells and the purity of 
the preparate was estimated to be over 95% as judged by SDS 
gel. Though several constructs have been reported for the 
expression of the NS3, considering the yield and the simplicity 
of the purification procedure which we described here offers 
clear advantage for structural studies, wherein large amounts 
of enzyme are required. Table 1 shows that in the absence of 
P41 the K m of ANS3 was 250 uM and was 2.0 min" 1 . In 



Table 1 

Effect of P41 on the cleavage kinetics of peptide NS5A/5B by ANS3 
and MBP-NS3 B 



Enzyme 


Km 








(uM) 


(min" 1 ) 


(min -1 uM -1 ) 


ANS3 


250 


2.0 


8 X10~ 3 


ANS3+P41 


99 


15.8 


160 XlO -3 


MBP-NS3 


360 


1.3 


3.6X10" 3 


MBP-NS3+P41 


196 


5.1 


26 X10" 3 



*Data are the mean values from three independent experiments. 
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Fig. 2. Purification of the NS3 protease domain. Samples deriving 
from single steps of the purification were loaded on an SDS-15% 
polyacryl amide gel, and bands were visualized by Coomassie stain- 
ing. Lane 1: molecular mass markers; lane 2: homogenate from K 
coli without construct; lane 3: homogenate from pTrcHisBANS3; 
lane 4: 750 ng of purified ANS3 after affinity chromatography; lane 
5: 1.5 \ig of purified AN S3 after affinity chromatography. 

contrast, in the presence of a 10-fold molar excess of P41 the 
K m (99 uM) was about 2.5 times lower than that in the case of 
ANS3 alone, and the k^t (15.8 min" 1 ) was about 8 times 
higher than that for ANS3 alone. As a result the k czi /K m value 
increased 20 times in the presence of P41. When we compared 
the k c&x /K m value of MBP-NS3, ANS3 protease activity was 
increased to 2-fold in the absence and 6-fold in the presence of 
P41. Our ANS3 protease activity was found to be the highest 
reported so far. The data presented above suggest that region 
1027-1218 represents the minimal domain required for the 
protease activity, since the region 1027-1218 exemplifies all 
the cleavage kinetics of the MBP-NS3 (985-1647) reported 
earlier [27]. 

The coefficient for proteolytic efficiency was comparable to 
that of two reports published recently [27,32], whereas the K m 
and fc^ values are comparable with only that of NS3 (encom- 
passing both the protease and helicase domains fused to 
MBP) [27]. In the case of ANS3, the presence of P41 clearly 
increases the affinity for the substrate (as evident from K m 
values) and thereby increases the catalytic rate (Table 1). Shi- 
mizu et al. [27] have made a similar observation with NS3 
which has both protease and helicase functional domains, 
whereas Steinkuhler et al. [32] have expressed NS3 protease 
domain encompassing amino acids 1038-1226 of HCV poly- 
protein and analyzed its activity on NS4A/NS4B cleavage site 
both in the presence and in the absence of NS4A protein. 
Though the kinetic differences were observed using a peptide 
derived from a NS4A-independent cleavage site (NS4A/4B), 
the data clearly show the activation role of NS4A in altering 
the NS3 activity in general. However, the affinity (K m ) was 
not altered in the presence of NS4A as evident from their 
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Fig. 3. TNT analysis of cleavage of NS5A/5B substrate by ANS3. 
Substrate representing NS5A/5B incubated, lane 1 : without ANS3 ; 
lane 2: with ANS3; lane 3: with both ANS3 and P41. All samples 
were incubated at 30°C for 30 min; proteins were labelled with 
[ 35 S]methionine and analyzed by SDS-PAGE followed by fluorogra- 
phy. 

published data. This may possibly be due to the absence of 
essential N-terminal amino acids for interaction with NS4A in 
their expressed protease domain. The fact that NS4A can 
interact with ANS3 in a similar way as with full NS3 and 
brings about similar changes confirms that ANS3 despite its 
truncated version and fusion retains the favorable conforma- 
tion for interaction. Based on the above observations we can 
conclude that amino acid residues 1027-1218 are sufficient to 
represent the complete protease function of NS3 even consid- 
ering its ability to interact with NS4A. Detailed analyses of 
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Fig. 4. TNT analysis on effect of P41 on ANS3 rate of cleavage. 
Normalized cleavage % is shown against different time intervals. 
Without P41 (D) and with P41 (0). 
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the regulation and the substrate requirements of the protease 
are expected to offer further insights into the mechanism of 
activation by P41. 

In addition, when we tested the ability of ANS3 to cleave a 
longer substrate representing the NS5A/B junction, as shown 
in Fig. 3, ANS3 could cleave the NS5A/B junction protein 
(amino acids 2203-2506) in close agreement with the observa- 
tion when the small synthetic peptide substrate NS5A/B was 
used. The addition of P41 again clearly enhanced the cleavage 
reaction. From the time course of these reactions (Fig. 4) the 
rate of cleavage was stimulated more than 2-fold by the addi- 
tion of P41. Since the purified recombinant ANS3 protease has 
a relatively smaller molecular size, representing the complete 
catalytic function of NS3 protease, and considering the ease 
and yield of purification, we believe that it is an ideal candi- 
date to generate further studies elucidating the structure-func- 
tion relationships, mechanism of interaction with NS4A pep- 
tide as well as developing protease inhibitors. 
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The Solution Structure of the N-terminal Proteinase 
Domain of the Hepatitis C Virus (HCV) NS3 Protein 
Provides New Insights into its Activation and 
Catalytic Mechanism 
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IRBM "P. Angeletti" The solution structure of the hepatitis C virus (BK strain) NS3 protein N- 

Via Pontina km 30.600 terminal domain (186 residues) has been solved by NMR spectroscopy. 

00040 Pomezia, Roma, Italy The protein is a serine protease with a chymotrypsin-type fold, and is 

involved in the maturation of the viral poly protein. Despite the knowl- 
edge that its activity is enhanced by the action of a viral protein cofactor, 
NS4A, the mechanism of activation is not yet clear. The analysis of the 
folding in solution and the differences from the crystallographic struc- 
tures allow the formulation of a model in which, in addition to the NS4A 
cof actor, the substrate plays an important role in the activation of the cat- 
alytic mechanism. A unique structural feature is the presence of a zinc- 
binding site exposed on the surface, subject to a slow conformational 
exchange process. 
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Introduction 

Hepatitis C virus (HCV) is recognised as the prin- 
cipal etiologic agent of parenterally transmitted 
non-A, non-B hepatitis (NANB-H: Choo et al, 1989; 
Kuo ei al, 1989). The virus establishes a chronic 
infection that persists for decades in at least 85 % of 
the infected individuals and up to 70% develop 
chronic active hepatitis. Chronic infection ulti- 
mately leads to the development of liver cirrhosis 
and hepatocellular carcinoma. Neither a vaccine 
against viral infection nor effective therapy for 
HCV associated chronic hepatitis has been devel- 
oped to date. With an estimated world-wide popu- 
lation of infected people of more than 150 million, 
HCV represents one of the most widely spread and 
challenging viral infections to block. 

The HCV virion has a positive strand RNA gen- 
ome of about 9.6 kb that encodes a polyprotein of 
about 3000 amino acid residues (Houghton, 1996). 
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The genetic organisation of HCV is similar to that 
of Flavi and Pestiviruses and it was classified as a 
separate genus of the Flaviviridae family. The anal- 
ysis of the sequence of HCV reveals that it exists in 
at least six major genotypes and 11 subtypes 
(Simmonds, 1994). However, all known HCV poly- 
protein sequences share at least 71 % identity. The 
structural protein "core" and the envelope glyco- 
proteins El and E2 are released from the N- term- 
inal portion of the polyprotein by action of cellular 
peptidases, while the non-structural proteins 
involved in the replication of HCV are released by 
the action of two virus-encoded proteinases: NS2-3 
and NS3 (for reviews, see Bartenschlager, 1997; 
Neddermann et al, 1997). NS2-3 is a zinc-depen- 
dent proteinase that performs a single proteolytic 
cut to release the N terminus of NS3. The proteo- 
lytic cleavage at the NS3/NS4A, NS4A/NS4B, 
NS4B/NS5A, NS5A/NS5B junctions is mediated 
by a serine proteinase contained within the 
N-terminal 180 amino acid residues of NS3. The C- 
terminal (residues 180-630) of NS3 has been 
demonstrated to possess helicase activity. It has 
also been shown that the action of a cof actor, 
NS4A, enhances the activity of the serine protein- 
ase in all the cleavages, via the formation of an 
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372 



NS3/NS4A complex. Interaction with the NS4A 
cofactor is required to perform the cleavages at 
NS3/NS4A, NS4A/NS4B and NS4B/NS5A junc- 
tions but the proteinase in its uncomplexed state is 
still able to cleave at the NS5A/NS5B boundaries, 
although with a much lower activity. 

Since the NS3 proteinase is involved in the 
maturation process of the virus, the study of the 
structure of this enzyme is of crucial importance 
from a pharmacological point of view, in that it 
can give a strong impulse to the design of inhibi- 
tors that may prevent its action and thus block the 
viral replication and spread. 

The crystallographic structures of the free 
enzyme (Love et aL, 1996; for simplicity in the fol- 
lowing we refer to it as ns3) and its complex with 
a peptide representing the central region of the 
NS4A protein cofactor complex (Kim et aL, 1996; 
Yan et aL, 1998; we refer to it as ns3-4a) have been 
solved. The overall topology is similar in both 
structures, and forms an N-terminal (approxi- 
mately residues 1-93) and a C-terminal (residues 
94-180) six-stranded anti-parallel (3-barrel. The bar- 
rels are packed like those of chymotrypsin-like ser- 
ine proteinases. The catalytic site is formed by the 
triad of residues H57, D81 and S139, and is found 
in the crevice between the two domains. In 
addition to the P-barrels, there are two helical seg- 
ments: al (residues 56-60), comprising the catalytic 
histidine residue, and al (residues 131-137) present 
also in the ns3 and ns3-4a structures. Two 
additional helices, aO (residues 13-21) and ot3 (resi- 
dues 172-180), are formed only in the ns3-4a struc- 
ture (Kim et aL, 1996; Yan et al, 1998). 

The commonly accepted mechanistic model of 
action of the serine proteinases implies a relay 
mechanism of hydrogen bonds involving, on one 
side, the carboxylate moiety of the Asp and the 5 
HN of the His residues and, on the other side, the 
£ N of the His and the y HO of the Ser residues. 
This relay of H-bonds activates the y O of the Ser 
residue, which can produce the nucleophilic attack 
on the C atom of the scissile bond (Fersht, 1984; 
Polgar, 1989; Lesk & Fordham, 1996). 

In the ns3 structure, the carboxyl group of D81 
is positioned far from and points away from H57 
(Love et aL, 1996) impairing the H-bond formation. 
Conversely, in the ns3-4a structure, the side-chain 
of D81 is within hydrogen-bonding distance from 
the H57 imidazole group (Kim et aL, 1996; Yan 
et aL, 1998). By comparison of the ns3 and ns3-4a 
structures it has been inferred that D81 is correctly 
positioned as a member of the canonical catalytic 
triad as a consequence of the presence of the NS4A 
cofactor (Love et aL, 1998). 

This conclusion, reasonable as it may seem from 
the crystallographic data alone, is challenged by 
new available biochemical evidence. In fact, the pH- 
dependence of the hydrolysis reaction studied in 
the presence of substrate and with or without NS4A 
titrates with the identical pX value of 7.2 (Landro 
et aL, 1997). The authors conclude that the activation 
role of NS4A is not exerted by a perturbation of the 
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pK a values of the active-site residues involved in 
the catalysis, in contrast with the model proposed 
by Love et aL (1998). On the other hand, NMR pH 
titration of the catalytic H57 residue of the free 
enzyme gives a pX a value of 6.8 (Urbani et aL, 1998). 
Assuming that the pK measured by Landro et aL 
(1997) reflects the actual value for H57, the differ- 
ence in pK a value from that of the free enzyme 
could be due either to a direct role of the substrate 
itself in the pX a alteration or to the different buffers 
used in the experiments. A kinetic analysis con- 
ducted on different types of substrate-like inhibitors 
in the absence and presence of the NS4A cofactor 
has shown that the action of NS4A peptide is 
exerted only on the P'-side of the substrate (Landro 
et aL, 1997). From these findings, the authors con- 
clude that NS4A modulates NS3 activity by altera- 
tion of the S' subsites. 

At the moment, despite knowledge of the crystal- 
lographic structures in the absence and presence of 
the NS4A cofactor, its mechanism of activation and 
its role on the catalytic triad relative orientation is 
not completely understood. Here, we illustrate the 
first solution structure of the NS3 proteinase 
domain (first N-terminal 180 residues of NS3 with 
the addition of a solubilising six residue tail with 
the amino acid sequence ASKKKK) obtained in the 
absence of the NS4A cofactor and based on NMR 
data. The novelty of our findings is that the global 
architecture responsible for the relative positioning 
of the catalytic residues is already present in the 
absence of the NS4A cofactor. This difference from 
the relative crystallographic structure ns3 accounts 
for all the biochemical evidence available to date. 
The action of the cofactor is then discussed in terms 
of stabilisation of the fold of the N-terminal region 
and by its influence on the substrate leaving-group 
S' side, while an influence on the substrate recog- 
nition S side can be excluded. Also, a possible role 
of the substrate in the relative positioning of the cat- 
alytic triad is proposed and discussed. 

A rather unusual structural feature of the NS3 
enzyme is the presence of a zinc-binding site com- 
pletely exposed to the solvent. We find that this 
site in solution undergoes a conformational 
exchange between an open and a closed confor- 
mation by switching the side-chain of HI 49 on the 
hundreds of milliseconds timescale. 

Results and Discussion 

Structure determination 

The solution structure of the protease domain of 
the hepatitis C virus NS3 protein of the strain BK 
was solved by multidimensional heteronuclear 
NMR spectroscopy (Clore & Gronenborn, 1991a; 
Bax & Grzesiek, 1993) making use of uniformly 
labelled ,5 N, 15 N/ 13 C and 2 H/ ft N/ 13 C samples, as 
well as of a selectively 15 N[LeuJ, 15 N[Val] and 
15 N[Ala]-labelled sample. Complete resonance 
assignment was obtained, except for 15 residues at 
the N terminus and few other signals. In fact, the 
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proton resonances of residues 6-21 appear to be 
broadened beyond detection by conformational 
exchange. Leucine residues 14, 15 and 21 reson- 
ances, for example, were identified as very broad 
peaks in a sample selectively labeled with leucine 
but it was not possible to identify them sequen- 
tially. The general quality of NMR data is shown in 
Figure 1, which depicts strips from (a) a 3D 13 C 
edited NOESY and (b) from a ,5 N edited 
NOESY. The structure was calculated, excluding 
the first 21 residues, by simulated annealing 
(Nilges et aL, 1988) using the database of con- 
straints shown in Table 1, where a summary of 
the structural statistics is given. In Figure 2, a 
stereoview of the overlay of the 20 lower-energy 
structures generated is shown. For simplicity we 
will refer in the following to the minimised 
average solution structure as nmr. 

Description of the structure 

The nomenclature used for trypsin p-strands and 
applied to the topology of the protein is shown in 
Figure 3(a). The sequence alignment of NS3 with 
several other chymotrypsin-like serine proteinases 
is proposed in Figure 3(b), on the basis of the struc- 
tural overlay, following the guidelines indicated by 
Greer (1990). This alignment differs slightly from 
that proposed previously (Love et aL, 1996). The 
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Figure 1. Examples of spectra showing the quality 
achieved with this protein, (a) Strips taken from a 13 C 
edited 3D NOESY show the aromatic H E and H s NOEs 
of the catalytic histidine residue, (b) Strips of the amide 
protons of both the catalytic histidine and serine resi- 
dues from a ,5 N edited 3D NOESY. 



Table 1. Experimental restraints and structural statistics 
A. NMR constraints 



NOE 


Total 


2476 




Intra 


945 




Inter short distance 






(<i+3) 


521 




Inter long-range 






(>»+3) 


1010 


Generic 


Total 


70 




H-bond 


64 




Zn -bin ding site 


6 


Dihedral 


Total 


83 




<i> 


37 




xi 


43 


Stereospecific 


Methylene groups 


31/219 




Methyl groups 


50/66 



B. Structure statistics 

R.m.s. deviations from experimental 
constraints* 

Distance (A) 

Dihedral(deg.) 

13 C° 

13 C p 

Deviations from idealized geometry 

Bonds (A) 

Angles (deg.) 

lmpropers (deg.) 
Coordinates precision referred to mean 
structure (A) 
Residues SCR+helices 

Backbone 

All heavy atoms 
All residues 

Backbone 

All heavy atoms 
N-terminal residues SCR+helices 

Backbone 

All heavy atoms 
C-terminal residues SCR+helices 

Backbone 

All heavy atoms 

C. Ramachandran analysis* 

% Residues in most favoured regions 
% Residues in allowed regions 
% Residues in generously allowed regions 
% Residues in disallowed regions 



0.076 ± 0.003 
1.331 + 0.148 
1.49 + 0.05 
1.09 + 0.06 

0.005 ± 0.0006 
0.761 ± 0.032 
0.543 ± 0.01 



0.472 ± 0.089 
1.147 + 0.160 

0.872 ± 0.097 
1.306 +0.117 

0.554 + 0.140 
1.020 ± 0.292 

0.233 ± 0.042 
0.567 ± 0.091 



70.4 ± 1.8 
26.7 ± 1.4 
3.0 ± 0.6 
0.0 ± 0.0 



a None of the structures exhibited distance violations greater 
than 0.5 A or dihedral angle violations greater than 5 °. 

b The program PROCHECK (Laskowski et al, 1993) was 
used to assess the overall quality of the structures. The residues 
with a heteronuclear NOE 1 *N- 1 H<0.6 (total 28 residues) 
were excluded from the computations because of their intrinsic 
mobility. 



structurally conserved regions (SCR) are indicated 
with boxes in the alignment. The overall sequence 
similarity is very low (<20%) / but the general top- 
ology is well conserved. The NS3 proteinase, like 
the proteinase from Sindbis virus (Tong et aL, 1993) 
and from the Semliki Forest virus (Choi et aL, 
1997), is a small proteinase (about 180 residues) 
and, as such, makes an economical use of loops, 
lacking . all of a series of connecting elongations 
that are a common feature of cellular proteinases: 
we will see below that this has a peculiarly rel- 
evant consequence. 

The N- terminal p-barrel appears to be less com- 
pact than the C-terminal one (Figure 2). Evidence 
for this is obtained from a comparison of the num- 
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T22 




K186 
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Figure 2. A stereoview of the 20 minim um-energy structures is shown. For the overlay, the SCR residues identify- 
ing the p-strands plus the helicoidal segments were used. Due to its peculiar mobility, see the NS4a interaction sec- 
tion, strand Dl was omitted in the calculation of the r.m.s.d. The first 21 residues were not included in the structure 
calculation, because for them no structural information was available. 



ber of slow-exchanging amide protons between the 
two domains: 12 in the N-terminal and 20 in the C- 
terminal domain. In Figure 4(a), (b) and (c) the 
number of observed NOEs per residue, the back- 
bone heavy atoms r.m.s.d. per residue, and the het- 
eronuclear 'H- 15 N NOEs are reported, respectively. 
The total number of NOEs for the N- terminal bar- 
rel is 990, whereas for the C-terminal it is 1486. TTie 
*H- 15 N NOE values in 17 residues in the N-term- 
inal portion of the molecule are below 0.6 (indicat- 
ing a high level of mobility), while this is the case 
for only six residues in the C-terminal domain 
(excluding the residues forming the solubilising 
tail). These differences are reflected in the poorer 
r.m.s.d. of the N-terminal barrel compared to the 
C-terminal one (Figure 4(b)). 

Table 2 illustrates the pairwise comparison of 
the SCR residues between the nmr and the ns3 and 
ns3-4a structures. The r.m.s.d. values for the back- 
bone heavy atoms are 1.45 and 1.18 A, respect- 
ively. However, if we consider the N and C- 
terminal p-barrels separately, we get a better 
insight into the comparison. There is a substantial 
difference between the N-terminal P-barrel, with 
an r.m.s.d. of 1.68 A (nmr/ns3), of 1.48 A (nmr/ 
ns3-4a) and of 1.98 A (ns3/ns3^a). The nmr struc- 
ture looks different to a similar extent from either 
of the two crystallographic structures, thereby 
suggesting that, in the crystal of the uncomplexed 
enzyme, crystal forces are responsible for distort- 
ing the N-terminal P-barrel. The C-terminal P~bar- 
rel with an r.m.s.d. of 0.52 A (nmr-ns3), of 0.56 A 
(nmr/ns3-4a) and of 0.41 A (ns3/ns3-,4a) is very 
similar in all three structures. However, a major 
difference in this domain is given by the fact that 
in solution (thus in the absence of the NS4A pep- 
tide) we find helix cx3 (residues 172-182), while this 
helix is absent from the ns3 (in the absence of 
NS4A peptide) structure. In Figure 5(a) a zoomed 
view of the overlay of the ns3 and nmr structures 
is shown, providing evidence for this difference, 
which is extremely relevant in the evaluation of 



the role played by NS4A peptide to activate the 
proteinase and will be further discussed in the cat- 
alytic triad and substrate-binding section. 

Table 2 also reports the pairwise comparisons 
with several other serine proteinases belonging to 
the chymotrypsin family. Also in this case it is 
instructive to consider separately the two p-barrels. 
The fold of the N-terminal SCR residues in solution 
is similar to that of chymotrypsin (Blevins & 
Tulinsky, 1985), trypsin (Bode et al, 1984) and elas- 
tase (Meyer et al, 1988), with an r.m.s.d. of 1.37, 
1.40 and 1.42 A, respectively, while the ns3 struc- 
ture gives 1.80, 1.91 and 1.84 A, respectively, and 
the ns3-4a structure gives 2.12, 2.12 and 2.11 A, 
respectively. One can conclude that, in absence of 
the NS4A peptide, the overall fold of the P-barrel 
is conserved; when forming the complex with the 
NS4A peptide, the N-terminal p-barrel undergoes a 
substantial change in conformation (specifically, 
the Dl strand and the loops preceding and follow- 
ing it; Figure 3(a)) that differentiates it locally from 
the other proteinases belonging to the same family. 

The C-terminal p-barrel of all the NS3 structures, 
although very similar to each other, differs from 
those of the other chymotryps in-like proteinases 
(r.m.s.d. >2.0 A). This is partially due to the 
slightly different packing of the E2 strand, which is 
directly involved in substrate binding, and of the 
C2 strand, which is packing directly against it 
(Figure 3(a)). This difference can be accounted for 
by the peculiarity of the substrate recognition sur- 
face for NS3. In fact, substrate specificity studies 
have shown that the NS3 proteinase requires at 
least a decamer peptide spanning P6-P4' for opti- 
mal activity. The substrate frame needed thus is 
unusually long for serine proteinases. 

Incidentally, the only other viral serine pro- 
teinases, Sindbis virus (Tong et al., 1993) and 
Semliki Forest virus core protein (Choi et aL, 
1997), for which a structure is available in the 
Brookhaven database, are more similar to NS3 
with an r.m.s.d. for the C-terminal p-barrel 
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Figure 3. Topology and sequence 
alignment, (a) The topology of the 
NS3 proteinase domain is rep- 
resented in a MOLSCRIPT view. 
The SC Regions are constituted by 
the strands Al-Fl in the N-terminal 
and A2-F2 in the C- terminal p-bar- 
rel The helices are named al, a2 
and oc3. (b) The structure alignment 
of NS3 with chymotrypsin-like ser- 
ine proteinase. The alignment 
results from the best superposition 
of the C a of the SCR residues, in 
the box. The upper numeration is 
referred to the chymotrypsin, while 
the bottom is referred to the NS3 
proteinase domain. In bold are the 
residues of the catalytic triad, while 
the asterisk (*) at the top indicates 
the position of residue S214, which 
is strictly conserved in all cellular 
proteinases, and its corresponding 
residue in the two viral proteinases. 
Below the NS3 amino acid 
sequence is shown the correspond- 
ing secondary structure elements; 
strands are coincident with the 
boxed regions, while the helices are 
underlined. 



ranging between 1.6 and 1.72 A for all the pair- 
wise comparisons. 

The positioning of NS4A 

The lower definition of the N-terminal P-barrel 
in the solution structure should be related to the 
absence of the NS4A protein cofactor. In Figure 6, 
a zoomed view of the overlay between the nmr 
(blue) and the ns3-4a (NS3 cyan, NS4A magenta) 
structures is shown. It has been shown by deletion 
mutagenesis experiments (Failla et a\., 1995), and 
more recently by the ns3-4a crystallographic struc- 
ture (Kim et al, 1996; Yan et al, 1998), Figure 6, 
that only the N-terminal P-barrel of the proteinase 
is involved in binding the NS4A peptide cofactor. 



According to these structures, the NS4A peptide 
cofactor is almost completely buried inside the core 
of the N-terminal P-barrel, where it forms a p- 
strand within a four-stranded p-sheet. Its compa- 
nion strands are formed by residues 4-10 (strand 
AO) and 33-37 (Al). Residues 13-21 form a short ot- 
helix (aO) that is likely to contribute to the stability 
of the NS3-NS4A complex in the crystal. This helix 
is very peculiar, since the residues exposed to the 
solvent are three leucine and two isoleucine resi- 
dues (Figure 6). It is unlikely that such a structure 
could exist in solution. In this respect, it is interest- 
ing to note that only one of the monomers in the 
crystallographic asymmetric unit is folded in the 
way just described (Kim et al, 1996). In the second 
monomer, the first 30 residues do not have a 
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Figure 4. Correlation between the number of NOE constraints, mobility and r.m.s.d. The SCR strands are defined 
by the arrows and reported in all three panels to simplify reading the Figure, (a) A representation of the number of 
NOEs per residue, in black are represented the NOEs involving backbone protons, while in grey are represented 
those involving the side-chains proton, (b) Behaviour of the r.m.s.d. per residue. Regions in which the r.m.s.d. is poor 
can be related to low values of the 'H-^N NOE (<0.6) and are in black, or to a lack or low number (<10) of NOE 
constraints, and are in white. Only two regions do not follow this characteristic, namely residues 39-41 and 120-122, 
both are well-characterized turns but. lack long-range NOEs. (c) Heteronuclear 'H-^N NOE value per residue num- 
ber. The broken line drawn at the value of 0.6 indicates the mobile regions (<0.6). 



defined structure. Despite these differences, both 
complexes contain the NS4A peptide in essentially 
the same position. It could be argued that more 
than one conformation can be assumed by the N- 
terrninal 30 amino acid residues of NS3 even in the 
presence of NS4A. In our experiments, residues 6- 
21 signals are broadened almost beyond detection, 
moreover residues 25-31 exhibit a high mobility in 
solution (Figure 4(c)). Therefore, it seems reason- 
able to assume that some type of mobility is affect- 
ing all the N-terminal 31 residues. Evidence 
obtained by limited proteolysis experiments 
suggest indeed that the N-terminal region of 
NS3 is highly accessible in solution, even in the 
complex with the NS4A peptide cofactor (data not 
shown). 

The N-terminal residues of the NS4A peptide 
contact directly the Dl strand and the preceding 
and following loops al-Dl and Dl-El, respectively 
(Figure 6). In solution, these regions together with 



the strand Dl itself are characterised by a high 
degree of mobility ('H-^N NOE < 0.6; Figure 4(c)). 
From the overlay of the nmr and ns3-4a structures 
shown in Figure 6, it is evident that the whole 
region encompassing the residues 61-66 (loop al- 
Dl and strand Dl) are held down toward the Fl 
strand in the crystallographic structure, while in 
solution this strand is packing across the Al 
strand, compensating for the absence of the NS4A 
peptide. From this comparison it can be concluded 
that one of the roles exerted by NS4A is to stabilize 
the strands Al, Dl, the loop al-Dl and Dl-El in a 
more defined conformation thus compacting the 
whole N-terminal barrel. Its influence on the sub- 
strate-binding region is primarily due to the direct 
interaction and consequent conformational stabilis- 
ation of the strand Al and the region al-Dl, which 
form the walls surrounding the P'-side of the sub- 
strate (see the next section). 
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Table 2. The r.m.s.d. comparison with several serine proteinases 



N-terminal 






















2snv 


5cha 


2alp 


lntp 


3est 


2sga 


3sgb 


ns3 


ns34a 


nmr 


1.91 


1.37 


1 86 


1 40 


1 42 


1 70 


1 82 


1 68 


1 48 


2s nv 




1.45 


2.12 


1.60 


1.50 


2.25 


2.03 


1.77 


2.45 


5cha 






1.81 


0.36 


0.25 


1.67 


1.73 


1.80 


2.12 


zalp 








1.78 


1.78 


1.09 


0.37 


2.02 


2.17 


lntp 










0.33 


1.64 


1.71 


1.91 


2.12 


3est 












1.66 


1.70 


1.84 


2.12 


2sga 














1.07 


1.96 


1.77 


3sgb 
















1.90 


2.11 


ns3 


















1.98 


C-terminal 






















2snv 


5cha 


2alp 


lntp 


3 est 


2sga 


3sgb 


ns3 


ns34a 


nmr 


1 72 


2 20 


1 86 


2 18 


7 7^ 


1 14 








2snv 




1.51 


1.66 


1.51 


1.52 


1.61 


1.61 


1.60 


1.66 


5cha 






0.75 


0.45 


0.49 


0.72 


0.79 


2.13 


2.18 


2alp 








0.76 


0.80 


0.42 


0.47 


2.06 


2.14 


lntp 










0.63 


0.69 


0.73 


2.11 


2.18 


3est 












0.78 


0.83 


2.16 


2.22 


2sga 














0.24 


2.02 


2.10 


3sgb 
















2.02 


2.10 


ns3 


















0.41 


all 






















2snv 


5cha 


2alp 


lntp 


3est 


2sga 


3sgb 


ns3 


ns34a 


nmr 


o no 


O HQ 


TIC 


2.U3 


2.09 


2.06 


2.11 


1.45 


1.18 


2snv 




1.66 


2.06 


1.74 


1.63 


2.07 


1.99 


1.95 


2.27 


5cha 






1.51 


0.46 


0.52 


1.40 


1.44 


2.04 


2.31 


2alp 








1.44 


1.47 


0.83 


0.47 


2.14 


2.25 


lntp 










0.60 


1.33 


1.38 


2.06 


2.26 


3est 












1.40 


1.44 


2.08 


2.31 


2sga 














0.77 


2.12 


2.03 


3sgb 
















2.04 


2.21 


ns3 


















1.66 



o 

Root-mean-square deviation (A) comparison of the SRC backbone residues heavy atoms for several serine proteinases. The names 
are given as Brookhaven PDB codes: lbt7 (nmr); lalq (ns3); ljxp (ns3^a); 2snv (Sindbis virus core protein); 5cha (bovine alpha-chy- 
motrypsin); 2a lp (alpha lytic protease); lntp (bovine beta trypsin); 3est (porcine elastase); 2sga {Sireptomyces griseus protease A); 3sgb 
{Streplomyces griseus protease B). Due to the high degree of similarity between the Sindbis virus and the Semliki Forest virus core 
proteins (Ivcp) (r.m.s.d. 0.4 A), the comparison are reported only for the former. 



The high level of similarity between the C-term- 
inal barrel of the solution and that of the crystallo- 
graphic structures, is quantified by the low r.m.s.d. 
of the backbone heavy atoms for the SCR residues 
(Table 2). It can be concluded that the complex with 
the NS4A peptide has little, if any, influence on this 
region of the enzyme, which contains the recog- 



nition pockets for the P-side residues of the sub- 
strate. 

These conclusions find experimental support 
from a steady-state kinetic analysis of inhibitor 
binding to the active site of the NS3 proteinase 
(Landro et ah, 1997). In fact, two classes of competi- 
tive inhibitors could be identified: those interacting 



a) 



b) 
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Figure 5. Pairwise superposi- 
tion of the nmr (blue) with ns3 
(red) and ns3-4a (cyan) struc- 
tures. The superposition is 
obtained from the r.m.s.d. of all 
the SCR residues. The catalytic 
triad residues are shown on only 
one of the structures to facilitate 
the comparison. The position of 
D81 in the nmr structure is indi- 
cated by a dot, since its position 
is not accurately determined. The 
residues were not shown for ns3, 
since only the C a coordinates are 
publicly available, (a) nmr/ns3: 
the strands El-Fl containing the loop that bears the catalytic Asp are different. In the ns3 structure, the helix oc3 is 
essentially absent, (b) nmr/ns3-4a: the strands El-Fl bearing the Asp residue are similarly positioned, and the helix 
o3, which is packed mainly against the strand El, is very similar in both structures. 
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only with the P binding pockets located on the C- 
terminal P-barrel and those compounds extending 
their interaction with the enzyme to both the P and 
P' binding sites. The potency of the former cat- 
egory of inhibitors was not influenced by complex 
formation with NS4A. In contrast, the affinity of 
active-site ligands relying only on contacts with 
the P binding site on the N-terminal p-barrel was 
strongly impaired in the absence of NS4A. 



The catalytic triad and substrate-binding region 

The serine proteinases of the chymotrypsin 
family share a number of elements: a catalytic triad 
formed by residues Ser-His-Asp; a site for hydro- 
gen bonding to a tetrahedral oxyanionic intermedi- 
ate of the reaction (also called the oxyanion hole); a 
strand forming an antiparallel p sheet with the P- 
side of the polypeptide chain of the substrate (also 
called S site), which contributes also to the for- 
mation of a recognition pocket, and a leaving- 
group recognition site (also called S' site). Each of 
these elements occurs in the different members of 



the family in an almost identical geometric 
relationship. 

The alignment with other serine proteinases 
shows that these residues and the nearby positions 
are well conserved (Figure 3(b)). From the structur- 
al point of view, the relative position of H57 and 
SI 39 is similar in the nmr and ns3-4a structures 
and only slightly more apart than in the other ser- 
ine proteinases (Table 3). On the contrary, in the 
ns3 structure "the imidazole of H57 [...] is oriented 
toward SI 39 but is not close enough to form the H- 
bond observed in proteinase structures" (Love 
et a\. t 1996). The difference in distance (Table 3) is 
very likely due to distortions induced by the crys- 
tal forces, since in solution we also find that the 
observed pX a value of 6.8 for H57 (Urbani et al, 
1998) (uncomplexed enzyme) is in agreement with 
the catalytic histidine residue being H-bonded with 
the catalytic serine, as expected from the canonical 
model of serine proteinases. 

In the nmr structure, the position of D81 is not 
accurately deterrnined because of the lack of exper- 
imental constraints. As a matter of fact, for the res- 
onances of residues 79, 80 and 81 no inter-residue 



Strand Al 



NS4A 

N- terminal 




Strand Dl 



Strand Fl 



Figure 6. The role of NS4A; zoomed vision of the superposition of nmr/ns3-4a structures (blue/cyan-magenta). 
The catalytic triad residues as well as R155 side-chains are shown to facilitate the comparison. The role of NS4A 
seem to be to order the N-terminal 21 residues in an initial strand and a subsequent helix that rums around the N 
terminus of NS4A. However, this helix exposes hydrophobic residues to the solvent while packing inside the hydro- 
philic residues. A situation like this is not possible in solution. The positioning of NS4A has the consequence of order- 
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Table 3. Geometric relation of the catalytic residues in 
serine proteinases 





His-Ser (A) 


His-Asp (A) 


Asp-Ser (A) 


nmr 


9.0 






ns3 


97 


7.9 


11.5 


ns3-4a 


9.3 


6.5 


10.5 


2snv 


8.8 


7.0 


10.6 


5cha 


8.5 


6.5 


10.5 


lntp 


8.5 


6.3 


10.1 


2alp 


8.3 


6.2 


9.9 


3est 


8.4 


6.5 


9.8 


2sga 


8.5 


6.2 


9.9 


3sgb 


8.4 


6.3 


10.1 



Comparison of the distances between C of the catalytic triad 
in NS3 structures and in several serine proteinases. The actual 
distances His-Asp and Asp-Ser for the nmr structures are not 
included, since the whole loop containing D81 undergoes sub- 
stantial conformational averaging, so that a single evaluation of 
distance would not be meaningful. 



NOE is observed and the signals arising from the 
amide protons of these residues are affected by 
their fast water exchange rates. The resulting calcu- 
lated structures show that several allowed confor- 
mations of this loop are sampled in the simulated 
annealing computations, all of them solvent- 
exposed. 

It is worthwhile to point out, that also in the 
ns3-4a crystal structure this loop is solvent-exposed 
and the temperature factors of the backbone atoms 
of the three residues 79-81 are around 50.0 A 2 , 
which indicates a high degree of mobility even in 
presence of the NS4A peptide. Moreover we find 
that, in the average solution structure of the iso- 
lated NS3 proteinase domain, the strands El and 
Fl, which enclose the loop bearing D81, are posi- 
tioned similarly to the crystal ns3-4a structure 
despite the absence of the cof actor, as clearly 
shown in the zoomed view of the overlay between 
the nmr and ns3-4a structures in Figure 5(b). These 
strands are held in position by the packing of the 
helix ct3 (residues 172-182), which in solution is 
similar to that in the ns3-4a crystal (Figure 5(b)). 
On the contrary, in the ns3 structure this helix is 



absent (Figure 5(a)), strongly suggesting that the 
crystal packing is inducing a distorted confor- 
mation in this region. The consequence is that the 
strands El and Fl that position the D81 are dis- 
torted and thus its positioning in the ns3 structure 
is not compatible with a well-formed catalytic 
triad. Our results in solution thus are in contrast 
with the conclusions drawn by Love et al (1998) 
based on the crystallographic structures alone. On 
the other hand, our results are in agreement with 
the biochemical evidence that the presence of 
NS4A is not affecting the p/C a of the catalytic resi- 
dues (Landro et ah, 1997), indicating that the cata- 
lytic triads of the enzyme-substrate complex and of 
the ternary enzyme-substrate-NS4A complex must 
possess very similar geometries. 

From all the above considerations, we conclude 
that other factors in addition to the presence of 
NS4A are to be invoked to stabilize the position of 
the D81 residue in a canonical catalytic triad con- 
figuration. The identification of such factors could 
be attempted by analysing the structures of several 
members of the chymo try ps in-like family of serine 
proteinases. Ideally, we can divide it into two sub- 
families; namely, the short-chain, about 180 resi- 
dues, and the long-chain proteinases, about 250 
residues (Bazan & Fletterick, 1998). In all the 
solved long-chain structures there are two con- 
served characteristics: (i) the residue at position 
214 is invariably serine, which forms an H-bond 
with the carboxylic group of the catalytic aspartate 
residue, thus helping in its correct positioning and 
alignment in respect to the catalytic histidine; (ii) 
the catalytic aspartate residue is shielded from the 
solvent by the hydrophobic residues at conserved 
positions. In Table 4 these characteristics are sum- 
marized for several long-chain serine proteinases 
and the short-chain proteinases for which a struc- 
ture is available. They do not share these character- 
istics, and incidentally these proteinases are all of 
viral origin. 

The protection from the solvent allows the direct 
observation of the 5 HN proton NMR signal of the 



Table 4. Summary of some serine proteinases that present both Ser in position 214 and the catalytic Asp sheltering 
from the solvent, and the few known exceptions 



Serine proteinase 


PDB entry 


Category 


Origin 


214 


Sheltering 


Reference 


Trypsin 


1NTP 


LC a 


Mammalian 


Ser 


Y94, L99 


Bode et al (1984) 


Chymotrypsin 


5CHA 


LC 


Mammalian 


Ser 


Y94, 199 


Blevins & Tulinsky (1985) 


Kallikrein 


2PKA 


LC 


Mammalian 


Ser 


F94, Y99 


Bode et al (1983) 


Thrombin 


1HXE 


LC 


Mammalian 


Ser 


Y94, L99 


Bode et al (1989) 


Mast cell proeinase 


3RP2 


LC 


Mammalian 


Ser 


Y94, N99 


Remington et al (1988) 


Tonin 


1TON 


LC 


Mammalian 


Ser 


Y94, L99 


Fujinaga & James (1987) 


Elastase 


1HNE 


LC 


Mammalian 


Ser 


Y94, L99 


Meyer el al (1988) 


Streptomyces griseus A 


2SGA 


LC 


Bacterial 


Ser 


F94, Y171, VI 77 


Moult et al (1985) 


Streptomyces griseus B 


3SGB 


LC 


Bacterial 


Ser 


F94, Y171, V177 


Read et al (1983) 


Alpha lytitc protease 


2 A LP 


LC 


Bacterial 


Ser 


F94, Y171, VI 77 


Fujinaga et al (1985) 


Sindbis core protein 


2SNV 


SC a 


Viral 


Leu 




Tong et al (1993) 


Semliki Forest core 












protein 


1VCP 


SC 


Viral 


Leu 




Choi et al (1997) 


NS3(M80) HCV 


1A1R 


SC 


Viral 


Arg 




Kim et al (1996) 




1JXP 








Yan et al (1998) 



* LC, long-chain, SC ( short-chain Serine proteinases. 
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histidine residue involved in the H-bond with the 
carbpxylic moiety of the aspartate residue (Frey 
et al, 1994). This proton resonates at an unusually 
low field chemical shift in the range of 14.5-19.0 
ppm depending on pH, and has been observed in 
several serine proteinases of the long-chain family 
(Markley, 1978; Bachovchin, 1985; Frey et al, 1994). 
In the case of the NS3 proteinase, however, despite 
all attempts, such a signal has until now not been 
observed. From both the nmr and ns3-4a struc- 
tures, one could argue that this is due to the site 
being solvent accessible. We speculate that, in sol- 
ution, the aspartate residue is unlikely to be 
engaged in an H-bond with the histidine residue, 
even in the presence of the NS4A cofactor. How- 
ever, an environment similar to that of the long- 
chain subfamily could be partially created with the 
participation of the substrate. We observe, in fact, 
for the short-chain enzymes a tendency to have a 
hydrophobic bulky residue in position P2 (Tong 
et ill., 1993; Ingallinella et al, 1998), whereas long- 
chain enzymes tend to have either glycine or short 
side-chain residues such as Ala (Yasutake & 
Powers, 1981; Chang, 1986; Coombs et al, 1996). A 
bulky residue in position P2 together with the ali- 
phatic methylene groups of R155 (NS3) and of the 
L231 side-chain (Sindbis and Semliki viruses) could 
contribute to shelter the aspartic acid side-chain, 
thereby favouring the formation of a catalytic triad 
machinery that more closely resembles that 
observed in long-chain enzymes. 

The pH titration data obtained in the presence of 
substrate (Landro et al, 1997) show that the value 
of pK a 7.2 of the catalytic residues of the NS3 pro- 
teinase is NS4A-independent, but this value is 
different from that of 6.8 found for the catalytic 
histidine residue of the free enzyme (Urbani et al, 
1998). One could speculate thus that a role in the 
enhancement of the pK a value is played by the sub- 
strate itself; however, it is possible that the differ- 
ence observed is due to the different buffer 
conditions used in the experiments. Preliminary 
spectroscopic evidence in agreement with the 
hypothesis of an involvement of the substrate itself 
in the conformational stabilization of the catalytic 
triad has been collected on substrate-based inhibi- 
tors (Cicero et a\., 1999). 

The residues preceding the catalytic serine resi- 
due form the oxyanion-stabilizing loop (residues 
137-139; Figure 7). The S site comprises strand E2, 
which forms one side of the specificity pocket, 
' with the A157 amide proton and carbonyl group 
accessible to form backbone to backbone H-bonds 
with the substrate P3 residue according to a classi- 
cal chymotrypsin-like substrate interaction. The 
recognition pocket is shallow and apolar, being 
formed by the methyl groups of A157 and L135 on 
the side and by the F154 aromatic ring at the bot- 
tom (Figure 7). Features of this pocket have been 
predicted from modelling studies (Pizzi et a\., 
1994). 

The S' site is constituted by the ending of the Al 
(T38) and the beginning of the Bl strands, as well 




Figure 7. Zoomed view of the substrate interaction 
region. The S' and S regions range approximately from 
strand Al (identified by the position of T38) to loop E2- 
F2 (G162), encompassing the rather flat surface defined 
by the E2 strand (F154 bottom of recognition pocket, 
A157 H-bond candidate with substrate P3 partner) on 
one side and the a2 helix (oxyanion hole G137-S139; 
LI 35 delimiting the top side of the recognition pocket) 
on the other side. 



as the Al-Bl short loop (Figure 7). Strand Al 
packs directly with the NS4A-activating peptide in 
the ns3-4a structure, thus explaining a direct influ- 
ence of the NS4A peptide on the P' portion of the 
substrate (leaving group). A more detailed insight 
into the S binding site, the detailed identification of 
the residues involved and their role in the inter- 
action with substrate-based inhibitors are pre- 
sented in the accompanying paper (Cicero et al. f 
1999). 

The Zn-binding site 

In a previous study, we showed that a zinc ion 
is required for the structural integrity and activity 
of the NS3 proteinase and, from modelling studies, 
the coordination site was predicted to be com- 
prised of loops F1-A2 (C97 and C99) and B2-C2 
(C145 and H149: De Francesco et al, 1996). Sub- 
sequent publication of the crystallographic struc- 
tures confirmed the predictions. In the 
crystallographic structures, the zinc coordination is 
essentially tetrahedral. However, the imidazole 
moiety and the zinc atom are too distant to be 
directly bound, and the authors postulated the pre- 
sence of a water molecule acting as a bridge (Love 
& Hostomska, 1996; Kim et al, 1996). 

The zinc-binding site in solution, as determined 
by NMR, is illustrated by Figure 8. Our data ident- 
ify the N 5 of HI 49 as the one involved in the bind- 
ing to the zinc ion. This structural result is in 
agreement with our previous findings on the tauto- 
meric state of HI 49, with N e being in the a state 
and N 6 being in the (3 state (Urbani et al, 1998). We 
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Figure 8. Zoomed view of the zinc-binding site. The 
residues chelating the zinc ion are shown; i.e., it is evi- 
dent the unusually long stretch that links CHS to HI 49. 
This longer linker could be required to allow the confor- 
mational switch of HI 49 and leave space accessible to 
the zinc ion. 



also found that the imidazole moiety of H149 
modulates the accessibility of the zinc ion, allowing 
an "open" and a "closed" conformation in the pro- 
tonated and unprotonated state, respectively, 
which interconvert on the 100 ms timescale 
(Urbani et «/., 1998). In this respect, one should 
notice that in the ns3 structure (Love et aL, 1996) 
H149 is postulated to participate in metal coordi- 
nation in only two of the three molecules in the 
asymmetric unit, whereas in the third one the irni- 
dazoyl side-chain moves away. In the experimental 
conditions used in this structural study, the closed 
conformation is dominant. In the NMR-derived 
structure, the imidazole moiety was not subject to 
specific restraints that would force the coordination 
with the zinc atom (see Materials and Methods) 
and in all the resulting structures it is positioned at 
a distance too far to directly chelate the zinc atom. 
This result is in agreement with our previous find- 
ings, in which HI 49 is ligated to the metal using 
the 51 N through a hydroxyl group (Urbani et al., 
1998). 

Most of the chymotrypsin-like proteinases have 
disulphide bridges that are believed to maintain 
the relative orientations of the residues involved in 
catalysis (Lesk & Fordham, 1996). Disulphide 
bridges present in these extracellular serine pro- 
teinases are unlikely to be stable in the reducing 
intracellular milieu. Since NS3 is an intracellular 
proteinase, we proposed that the zinc-binding site 
is used to stabilise the relative orientation of the 
two P-barrel domains, thus indirectly influencing 
the position of the catalytic triad, which is located 
in the crevice between the domains (De Francesco 
et al, 1996). 



However, we observe also that the histidine resi- 
due is highly conserved in all the HCV strains and 
HCV-related viruses. This fact, together with the 
unusual location of the site, i.e. completely solvent- 
exposed, and its dynamic behaviour in solution 
may suggest that there could be a biologically rel- 
evant function, not yet clarified, linked to this zinc- 
binding site. Hepatitis C is a small virus that 
encodes only six non-structural proteins. Therefore, 
as observed in other viruses, multi-functional pro- 
teins are, a strategy to minimise the number of 
agents needed during viral replication. Then the 
optimisation of the role of the zinc ion and the 
peculiar features of its binding site would be just 
part of the same biomolecular strategy. 

Materials and Methods 

Expression, purification and solubilisation 

Escherichia coli cells BL21(DE3) were transformed with 
a plasmid containing the cDNA coding for the serine 
proteinase domain of NS3 under the control of the bac- 
teriophage T7 gene 10 promoter. A solubilisation tag 
(ASKKKK) was inserted at the C terminus of the NS3 
enzyme sequence (Steinkiihler et al., 1998). The 15 N, 
15 N/ 13 C and 15 N/ 13 C/ 2 H uniformly labelled samples 
were obtained allowing the cells to grow in M9 mini- 
mum medium supplemented with 1 g/1 [ 15 N]ammonium 
sulphate (Martek), 2.0 g/1 [ 13 C]glucose (Martek). A 
further addition to the medium of 6.8 mg/1 ZnCl 2 was 
necessary, since the protein is a zinc-binding proteinase. 
For the perdeuterated sample, growth was carried out in 
99 % 2 H 2 0 (Martek). The A N selectively labelled sample 
(Leu, Val and Ala) was obtained by incorporation of 
0.33 g/1 each of 15 N selectively labelled Leu, Val and Ala 
in M9 modified medium. The NMR samples were pre- 
pared by dialysis against a buffer containing 20 mM 
sodium phosphate, 4% deuterated glycerol (Isotec Inc.), 
3 mM DTT, 1.5 mM Chaps (pH 6.3), 5 mM NaN 3 . The 
protein concentration was in the range of 0.7-0.9 mM. 
The aggregation state of the samples was verified with 
dynamic light-scattering and sedimentation equilibrium 
studies. The solutions were monodispersed and the pro- 
tein behaviour was compatible with a monomeric state 
in solution (S. Di Marco & M. Sollazzo, unpublished 
results). 

Data collection and assignment 

All the NMR experiments were acquired at 298 K on a 
Bruker A MX 500 MHz, Varian Unity Plus 600 MHz, Bru- 
ker DMX 600, and 800 MHz all equipped with z-shielded 
gradient triple resonance probes. Spectral assignments 
were obtained using the following 3D experiments: CT- 
HNCO, CT-HNCA, CT-HNCOCA, CBCACONH, 
HNCAHA, HN(COCA)HA, HACACO, (H)CCH-COSY, 
H(C)CH-COSY, H(C)CH-TOCSY, (H)CCH-TOCSY, C- 
CONH-TOCSY, 15 N edited 'H TOCSY (Clore & 
Gronenborn, 1991a,b; Bax & Grzesiek, 1993). The follow- 
ing 3D experiments were acquired for the perdeuterated 
sample: CT-HNCA, CT-HNCOCA, HNCOCACB, 
HNCACB (Yamazaki et al, 1994). 

NOE-type 3D experiments were acquired (Clore & 
Gronenborn, 1991a): on the perdeuterated sample, 15 N 
edited NOESY (80 ms); on the double-labelled sample, 
two sets of 13 C edited NOESY (80, 100 and 150 ms), one 
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optimized for aliphatic and another set for aromatic resi- 
dues; on the N single-labelled sample, 15 N edited 
NOESY (80 and 150 ms) and 15 N edited ROESY (20 ms). 
On the 15 N selectively labelled Leu, Ala and Val sample, 
15 N edited NOESY (100 ms) and TOCSY (24 ms) were 
also acquired. 

Coupling constants and stereospecific assignments: on 
the 15 N sample, 3D experiment HNHA (Kuboniwa et al, 

1994) ; on the double labelled sample, 3D HAHB 
(Grzesiek et al., 1994) experiment, and 2D CO and N 
decoupled (Hu et at., 1997). On the 10% 13 C-IabeIled 
sample was acquired a CT-HSQC for the stereospecific 
assingmertt of methyl groups of Leu and Val (Neri et aL, 
1989). 

Spectra were processed using NMRPipe (Delaglio et at., 

1995) and analysed using NMRView (Johnson & Blevins, 
1994) software packages. 

Structure calculation 

Approximate interproton distances were derived from 
the multidimensional NOE spectra (Clore & Gronenborn, 
1991a). NOEs were grouped into three distance ranges 
1.8-2.8 A (1.8-3.0 A for NOEs involving HN protons), 
1.8-3.4 A (1-8-3*6 A for NOEs involving HN protons) 
and 1.8-5.0 A (1.8-6.0 for NOEs involving methyl 
groups); 0.6 A was added to the upper bounds of the 
strong and medium NOEs involving methyl groups. No 
constraint was included for the zinc-binding site during 
the early stages of the calculation. After verification that, 
in each structure, the ligands were always disposed 
approximately in a tetrahedral geometry, the zinc atom 
was subsequently incorporated into the calculations 
using six distance and one valence angle restraint invol- 
ving the cysteine residues and the explicit zinc atom 
(Omichinski et at., 1990). HI 49 was not constrained to 
bind the zinc atom, since our previous results suggested 
that the coordination could be mediated through a 
hydroxyl group (Urbani et aL, 1998). Protein back- 
bone hydrogen-bonding restraints (cf NH -o = 1-6-2.9 A, 
^n-o — 2.4-3.6 A) within areas of regular secondary struc- 
ture were introduced during the final stages of refine- 
ment using standard NMR criteria based on backbone 
NOE 3 / HNaH coupling constants, supplemented with sec- 
ondary 13 C shifts. The 4>, \|/ and torsion angle 
restraints were derived from homo- and heteronuclear 
three-bond coupling constant data, employing as mini- 
mum ranges ±25 °, ±40° and ±30°, respectively. The 
structures were calculated by simulated annealing with 
the program X-Plor 3.851 (Brunger, 1993) on a SGI 02 
R10000 platform, using a protocol described by 
Omichinski et at. (1997). Figures and statistical analysis 
were generated using the program Lnsightll (Molecular 
Simulations Inc.). 

Brookhaven Protein DataBank 

The coordinates of the final 20 simulated, annealing 
structures have been deposited in the Brookhaven Pro- 
tein DataBank, accession code lbt7. 
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The hepatitis C virus NS3 proteinase plays an essential role in processing of HCV nonstructural precursor polyprotein. To 
detect its processing activity, we developed a simple trans-cleavage assay. Two recombinant plasmids expressing the NS3 
proteinase region and a chimeric substrate polyprotein containing the NS5A/5B cleavage site between maltose binding 
protein and protein A were co-introduced into Escherichia coli cells. The proteinase processed the substrate at the single 
site during their polyprotein expression. Deletion analysis indicated that the functionally minimal domain of the NS3 
proteinase was composed of 146 amino acids, 1059 to 1204. We isolated several cDNA clones encoding the functional 
domain of the NS3 proteinase from the sera of patients chronically infected with HCV and determined their proteinase activity 
by this trans-cleavage assay. Both active and inactive clones existed in the same patients. Comparative sequence analyses 
of these clones suggested that certain point mutations seemed to be related to the loss of proteolytic activity. This was 
confirmed by back mutation experiments. Among the critical mutations, Pro-1168 to Thr and Arg-1135 to Gly were intriguing. 
These amino acids, which are situated near the oxyanion hole, seem to be essential for maintaining the conformation of the 
active center of the NS3 proteinase. © 1998 Academic Press 



INTRODUCTION 

Hepatitis C virus (HCV) is the major etiological agent 
of posttransfusion non-A, non-B hepatitis worldwide 
(Choo et at., 1989; Kuo et ai, 1989). HCV infection results 
in mild and acute liver disease, but chronic infections are 
common and may eventually develop into cirrhosis or 
hepatocellular carcinoma (Saito et ai, 1990). Although 
interferons are currently used for the treatment of chronic 
hepatitis, their efficacy is limited to a small portion of 
patients owing to insufficient suppression of HCV repli- 
cation. Therefore, another reliable anti-HCV agent is nec- 
essary to control HCV hepatitis. 

HCV has positive-strand RNA approximately 9400 nu- 
cleotides long which encodes a single polyprotein of 
about 3010 amino acids (aa) (Choo etaL 1989, 1991; Kato 
et ai, 1990; Takamizawa et ai., 1991). Since its genomic 
organization is similar to those of flaviviruses and pesti- 
viruses, HCV is classified as a member of the family 
Flaviviridae (Miller and Purcell, 1990; Takeuchi et ai, 
1990). Following the 5'-untranslated region, the viral 
structural proteins are located at the amino (N)-terminal 
region of the polyprotein in the order of core, El and E2. 
After being translated as a precursor polyprotein, the 
structural proteins are processed by a host cell signal 

Sequence data from this article have been deposited with GenBank 
under Accession Nos. AB013620-AB013651. 
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peptidase(s), perhaps with no apparent involvement of 
virus-coded proteinase. The nonstructural (NS) proteins, 
which represent the essential machinery for viral repli- 
cation, are located in the carboxyl (C)-terminal region in 
the order of NS2-NS3-NS4A-NS4B-NS5A-NS5B (Rice, 
1996). In contrast to the cleavage of the structural pro- 
teins by the host enzyme, the NS proteins are processed 
by virus-coded proteinases. The HCV has two protein- 
ases, NS2/3 proteinase and NS3 serine proteinase. The 
cleavage at NS2-NS3 is mediated by the former protein- 
ase encoded within a region composed of the C-terminal 
portion of the NS2 gene and the N-terminal portion of the 
NS3 gene. Most likely this enzyme is a zinc-dependent 
metalloproteinase whose His-952 and Cys-993 were in- 
volved in catalysis (Grakoui et ai, 1993a; Hijikata et ai. 
1993a). This putative metalloproteinase partially overlaps 
with the NS3 serine proteinase and presumably auto- 
cleaves the NS2/NS3 junction. The NS3 serine protein- 
ase was shown to cleave the NS3/NS4A, NS4A/NS4B, 
NS4B/NS5A, and NS5A/NS5B junctions (Manabe et ai, 
1994). The active site of this proteinase is composed of 
three highly conserved aa residues, His-1083, Asp-1107, 
and Ser-1165, which are well known as a catalytic triad of 
the serine proteinase family (Bazen and Fletterick, 1 990; 
Bartenschlager ef ai, 1993). It was confirmed .that the 
Ser-1165 in the NS3 protein was essentia! for cleaving 
the downstream portion of the polyprotein by using in 
vitro transcription-translation systems and some mam- 
malian cell culture systems (Chambers et ai, 1990; Gra- 
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koui et al, 1993b,c; Hijikata et ai, 1993b; Tomei et al., 
1993). Comparison of the sequences around each cleav- 
age site revealed the unique substrate specificity of the 
proteinase: Cys or Thr at the P1 position, Ser or Ala at the 
PI' position, and Asp or Giu at the P6 position (Grakoui 
et ai, 1993b). Among other NS proteins, NS4A, a 54- 
residue amphipathic peptide, was shown to act as a 
cofactor for the NS3 proteinase interacting with the N- 
terminal portion of the enzyme (Bartenschlager et ai, 
1995). Recently, the crystal structures of NS3 serine pro- 
teinase were reported from two groups. The tertiary 
structure of the proteinase was revealed to adopt a 
chymotrypsin-like folding, and its unique conformational 
aspects including a zinc-binding site and its complex 
formation with an NS4A peptide were elucidated (Kim et 
ai, 1996; Love et al., 1996). 

Since the viral serine proteinase is an attractive target 
for antiviral therapy, several assay systems of NS3 pro- 
teinase using in vitro transcription-translation systems 
and some mammalian cell culture systems have already 
been developed. Previously, we constructed an enzy- 
matic assay system using one of our original HCV 
clones, D51, which displays functional activity of the 
purified enzyme in vitro, and studied the characteristics 
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of the proteinase (Mori et al., 1996). We also reported the 
importance of the N-terminal part of NS3 for its protein- 
ase activity (Mori et al., 1997). In parallel with these 
previous studies, we newly cloned HCV proteinase 
genes from HCV-positive sera of chronic hepatitis pa- 
tients into a plasmid of Escherichia coli and examined 
their ability to cleave a coexpressed substrate fusion 
protein containing the NS5A/5B cleavage site between 
maltose binding protein (MBP) and protein A. tt was 
found that the isolated NS3 proteinase clones had vari- 
ety of activities and some of them were inactive. In this 
study, we identified critical point mutations at Pro- 1168 
and Arg-1135, which lead to loss of proteinase activity, 
and discuss how these amino acids contribute to the 
processing activity from a structural point of view. 

RESULTS 

Construction of trans-cleavage assay system 
in E. coli 

To evaluate the activity of isolated HCV NS3 protein- 
ase clones, we developed an assay system which de- 
tects the proteinase activity by cotransformation of E. coli 
with expression plasmids containing a cloned NS3 pro- 
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FIG. 1. Construction of frans-cleavage assay system in E. coli (A) Schematic representation of the recombinant substrates and the HCV NS3 
proteinases produced by the expression plasmids. (B) A substrate (MCP-C2, lanes 1, 3. 5. 7. and 9; MCP-A1, lanes 2. 4, 6. 8. and 10) and an enzyme 
clone (MKC3. lanes 1 and 2; MKC4, lanes 3 and 4; D51 lanes 5 and 6; D51S1165A, lanes 7 and 8; vector only, lanes 9 and 10) were coexpressed in 
double transformants of the substrate and the enzyme expression vectors by IPTG induction. The proteolytic activity was analyzed by SDS-PAGE 
followed by CBB staining. The substrate (68 kDa) and the products (43 and 29 kDa) are indicated by arrows. (C) Western blotting of the same sample 
as (B) using HRP-conjugated goat IgG anti-rabbit IgG. Only the substrate and the processed C-terminal product (68 and 29 kDa). which contained a 
protein A domain, were detected. The details are given under Materials and Methods. 
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teinase and a recombinant substrate. First, our protein- 
ase-active original cDNA clones pMKC3, pMKC4 ( and 
pD51, which encode the residues 900-1260 of the viral 
polyprotein, were introduced into E. coli together with 
substrate expression plasmid pMCP-C2, which encodes 
a fusion protein containing the NS5A/5B cleavage se- 
quence between MBP and protein A junctions (Fig. 1A). 
As shown in Fig. 1B, we found at the CBB staining level 
that the substrate (68 kDa) was cleaved into two 
polypeptides, MBP (43 kDa) and protein A (29 kDa). The 
processing activity was confirmed by Western blotting 
analysis using HRP-conjugated IgG (Fig. 1C). N-terminal 
sequence analysis of the protein A band produced by the 
proteinase revealed that the substrate was indeed pro- 
cessed at the NS5A/5B cleavage site (data not shown). 
When the Ser-1165 of the enzyme was replaced with Ala 
(Figs. 1B and 1C, lanes 7) or the PI Cys residue of the 
recombinant substrate was replaced with Ala (Figs. 1B 
and 1C, lanes 2, 4, 6, 8, and 10), the cleavage was not 
observed. These results indicated that this assay is use- 
ful for screening for functional cDNA clones with protein- 
ase activity. 

Deletion analysis of the proteinase region 

Using this E. coli trans-cleavage assay, we then deter- 
mined the minimal region to maintain the proteinase 



activity. A series of N- and C-terminal deletion mutants 
from the D51 clone were constructed and subjected to 
the trans-cleavage assay (Figs. 2A-2C). Although the 
cleavages by the proteinase region (900-1260) and 
(1059-1214) were not efficient for detection by CBB stain- 
ing (Fig. 2B), they were clearly detected by Western 
blotting (Fig. 2C). The N-terminal deletion experiments 
(lanes 2, 5, 7-14) indicated that the N-terminal border 
essential for trans cleavage of the substrate at the 5A/5B 
site was Val-1059. On the other hand, the C-terminal 
deletion experiments (lanes 2-6, 15) indicated that the 
functional C-terminal border was Thr-1204. From these 
results, the minimal NS3 proteinase region was nar- 
rowed down to the region between aa residues 1059 and 
1204. To detect the proteinase activity sensitively, we 
adopted Western blotting analysis in the following exper- 
iments. 

cDNA cloning of NS3 proteinase region from a 
patient and determination of its activity 

Using RT-PCR methods, we isolated cDNA fragments 
of the HCV genome coding NS3 proteinase region 1027- 
1260 from two patients' sera (N and U). From each serum, 
we obtained several clones and examined them individ- 
ually for their proteinase activity (Fig. 3). Several clones 
did not cleave the substrate (Fig. 3 t clones MKC2, N-A1, 
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FIG. 2. Mapping of the minimal NS3 region required for cleavage at the NS5A/NS5B site. A series of N- and C-terminat deletion mutants from D51 
proteinase clone were constructed and subjected to the frans- cleavage assay. (A) Schematic representation of the deletion mutants used in the assay. 
The numbers indicated on the left side correspond to the lane numbers in (B) and (C). (B) The processing activities of each deletion mutant were 
determined by SDS-PAGE followed by CBB staining. (C) The substrate and the processed C-terminal product, which contained a protein A domain, 
were detected by Western blotting using HRP-conjugated goat IgG anti-rabbit IgG. 
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FIG. 3. Determination of processing activities of NS3 proteinase 
cDNA clones obtained from HCV-infected patients' sera. HCV NS3 
proteinase cDNAs encoding the aa sequence from 1027 to 1260 of the 
HCV precursor polyprotein were cloned into the expression plasmid as 
• described under Materials and Methods. Each clone was coexpressed 
with the substrate expression plasmid pMCP-C2 (Fig. 1A) in E. coli and 
its processing activity was determined. Both precursor and processed 
polyproteins containing a fused protein A domain were detected by 
Western blotting with HRP-conjugated IgG. Clones N-A1-A4 and N-B1- 
B14 were isolated from the serum of patient N. Also, clones U-1 -11 
were isolated from another sole serum source, patient U. 

N-A3, and U-2). Since the expression of the NS3 enzyme 
was detected in each assay sample except for clone 
N-A1 (data not shown), the cDNA clones MKC2, N-A3. 
and U-2 were considered not functional. 

To study the correlation between their aa sequences 
and their cleavage activities, we analyzed DNA se- 
quences of each isolated clone. In the total nucleotide 
sequences, 1.18 and 0.27% of base substitutions from the 
consensus sequence were found in the clones derived 
from sera N and U, respectively (data not shown). These 
mutation rates account for 0.85% of the combined total 
sequences. However, 85% of the substitutions occurred 
at the third codon and aa changes were not frequent. 
Figure 4 shows aa sequences of each clone. The se- 
quences of clone N-B5, N-B6, N-B9, N-B11, and N-B13 
were the same as that of N-A2, and the sequence of 
N-B12 was the same as that of N-B3. Similarly, se- 
quences of U-3, U-6, U-7, and U-11 were identical to that 
of the U-1 clone. From the comparative alignment, it was 
revealed that the sequences could be classified into two 
groups by their serum source, N or U at six positions (at 
1115. Pro and Ser; at 1148, Asn and Thr; at 1151, Ser and 
Ala; at 1196. Va! and lie; at 1222. Ala and Thr: at 1239, Lys 



and Arg, respectively). In addition to such diversity, each 
clone had a few extra minor point mutations. However, 
the majority of such minor mutations did not affect pro- 
teinase activity (Fig. 3). Among four inactive clones, clone 
N-A1 had a single base pair deletion which made a stop 
codon at aa 1079 (Fig. 5A). The appearance of a stop 
codon was consistent with the finding that the enzyme 
expression was not detected in clone N-A1 (data not 
shown). The aa sequence of its back mutant N-A1N, 
instead of the sequence of N-A1, is shown in Fig. 4. The 
other inactive clones, MKC2, N-A3. and U-2, had a re- 
placement of Arg-1135 by Gly, Pro-1168 by Thr, and His- 
1083 by Leu, respectively. These point mutations seemed 
to cause the loss of activity of the NS3 enzyme. 

Analysis of the inactive proteinase clones 

The comparative alignment revealed that the aa se- 
quence of clone U-2 was identical to that of clone U-1 
except for His-1083. Therefore, it was concluded that the 
loss of the proteinase activity was due to the point 
mutation at His-1083 to Leu. Since His-1083 is one of the 
catalytic triad residues (His-1083, Asp-1107, Ser-1165), it 
was clear that the point mutation at this site caused the 
loss of proteinase activity (Hijikata et a/., 1993a). 

To examine whether the aa substitutions in other in- 
active clones caused the loss of activity we next con- 
structed a back mutant for each inactive clone. A corre- 
sponding nucleotide was inserted to correct the non- 
sense mutation in clone N-A1 and a single base pair 
substitution was introduced in MKC2 and N-A3. respec- 
tively (Fig. 5A). When the back mutant clones N-A1N, 
MKC2N, and N-A3N were introduced in E. coli together 
with the substrate expressing plasmid, cleavages of the 
recombinant substrate were observed in all cases (Fig. 
5B). These results confirmed that each predicted aa 
change caused the loss of proteinase activity. The im- 
munoblot analysis using anti-NS3 polyclonal antibody 
showed that each enzyme, except for the clone N-A1, 
normally expressed in E. coli (Fig. 5B, bottom), eliminat- 
ing the possibility that the failure of detection of the 
proteinase activity was due to the enzyme instability or 
low-level expression. To further analyze the contribution 
of positive charge at the side chain of position 1135 to 
the processing activity we prepared additional mutants 
from clone D51, in which Arg-1135 was substituted by 
Lys (D51R1135K) or Gin (D51R1135Q) as well as Gly 
(D51R1135G). Interestingly, it was found that Gin as well 
as Lys could substitute for Arg. However, when the res- 
idue was changed to Gly, the D51 clone lost its activity, 
like the MKC2 clone (Fig. 5B). This finding indicates that 
any factor other than a positive charge at position 1135 
may contribute to the NS3 proteinase activity. Further- 
more, mobility shift of the enzyme was observed in the 
mutants at position 1135, suggesting the importance of this 
aa position for enzyme conformation (Fig. 5B, bottom). 
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FIG. 4. Comparative alignment of the deduced amino acid sequences of the NS3 proteinase genes. MKC3, MKC4, and N-A1-A4 were initially obtained 
as 1196-bp DNA fragments encoding the aa sequence from 900 to 1260 of the HCV precursor polyprotein. The others were obtained as 716-bp DNA 
fragments encoding the aa sequence from 1027 to 1260 of the HCV precursor polyprotein. Although MKC3 and MKC4 are identical within the NS3 region, 
they are different within the NS2 region. The clones isolated from the same patient's serum are indicated by brackets. N-A1 N is a proteinase- positive 
revertant of N-A1, which was mutated by a single base pair (G/C) insertion at the starred position (Fig. 5). Amino acid residues predicted to be null mutations 
are indicated in bold capitals. The residues predicted to form a catalytic triad (H1083, D1107, and S1165) are indicated by #. 



DISCUSSION 

We developed a trans-cleavage assay of HCV NS3 
proteinase which detected its specific cleavage activity 
in E. coli. By using a coexpressed proteineous substrate, 
the cleavage activity was easily detected by SDS-PAGE 
and Western blotting. Using this trans-cleavage assay, 
we determined the minimal proteinase region in the 
N-terminal third of the NS3 gene by constructing a series 
of deletion mutants of the D51 clone. Previously, the 
proteinase domain was identified within region 1049- 
1215 (Tanji et al., 1994) and we also used the region 
1050-1214 as an active NS3 proteinase for its character- 
ization (Mori et al. f 1996). In this study, the analyses of 
additional deletion mutants revealed that the minimal 
proteinase region was mapped between Val-1059 and 
Thr-1204 of HCV precursor protein, which was con- 
structed with 146 aa residues (Fig. 2). The tertiary struc- 
ture of the proteinase was revealed to adopt a chymot- 
rypsin-like folding (Kim et al, 1996; Love et ai, 1996). It 



was reported that positions of secondary structure ele- 
ments were well matched to those of chymotrypsin, al- 
though some )3 strands did not superimpose with the 
equivalent strands in other chymotrypsin-like protein- 
ases. Both the terminal residues of the minimal region, 
Val-1059 and Thr-1204, are located at the N-terminal end 
of /3 strand Al and the end of C-terminal a helix, respec- 
tively (Kim et ai. 1996; Love et al, 1996). NS3(1059-1204) 
covers the minimal region containing all secondary 
structures which are conserved among chymotrypsin- 
like proteinase family members. Our results suggest that 
this region forms a core domain essential for processing 
activity. 

In the course of studying the activity of HCV proteinase 
cDNA clones obtained from HCV-infected patients, we 
found both functionally active and inactive clones of NS3 
proteinase by the trans-cleavage assay (Fig. 3). It is 
interesting that some inactive clones (N-A1, N-A3 from 
serum N and U-2 from serum U) existed together with 
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FIG. 5. Mutation analysis of proteinase-defective clones. (A) Predicted mutation sites of proteinase-defective clones. Underlines indicate replaced 
aa residues and nucleotides. (B) To confirm the predicted null mutation sites shown in (A), back mutation experiments were performed. The Opal 
mutation (indicated as Op in the figure) in N-AI was repaired by single G/C base pair insertion. The aa sequence of the resultant clone designated 
as N-A1N is shown in Fig. 4. The Thr-1168 in N-A3 and the Gly-1135 in MKC2 were replaced by Pro (N-A3N) and Arg (MKC2N), respectively. 
Furthermore, the Arg-1135 in D51 (an active clone) was replaced by Gly (D51R1135G), Lys (D51R1135K), or Gin (D51R11350) residues. D51S1165A, in 
which the Ser-1165 was replaced by Ala residue, was used as inactive control. The proteinase activities of these clones were determined by 
frans-cleavage assay in E. coli The bottom part of this figure shows the expression of NS3 proteinase analyzed by Western blotting using rabbit 
polyclonal anti-NS3 antibody. 



active clones in the same patient's serum. The sequenc- 
ing analysis of the cloned NS3 genes revealed that the 
base substitutions amounted to 0.85% in the total nucle- 
otide sequences (data not shown). Among the substitu- 
tions, 85% of the cases occurred at the third codon, 
which did not change the aa sequences. Since the anal- 
ysis of the sequence of HCV genome might inevitably 
involve the possibility of nucleotide change due to PCR 
error, we are not able to entirely exclude such possibility. 
However, these biased nucleotide mutations may reflect 
the characteristics of an error-prone RNA-dependent 
RNA polymerase and the lack of an associated repair 
mechanism in the viral replication, which may cause the 
quasispecies of the HCV genome within infected individ- 
uals (Bukh et al„ 1995). Among such quasispecies, some 
mutations may happen to change an aa residue(s) criti- 
cal for proteinase activity. In this study, we identified four 
cases of such critical mutations in sera of patients chron- 
ically infected with HCV. 

Among the four inactive clones, N-A1 was found to 
have a nonsense mutation at position 1079 (Fig. 5A). 
Since the translation of the HCV polyprotein may be 
terminated at the stop codon, N-AI is considered to be a 
clone defective in HCV replication. In another inactive 
clone, U-2, a point mutation occurred at a histidine res- 
idue of the catalytic triad (His-1083) and directly made 



the proteinase inactive. In the third case of N-A3, muta- 
tion at Pro-1168 was revealed to induce inactivation of 
the enzyme. The proline residue follows the "oxyanion- 
stabilizing loop" (Leu-1161 to Ser-1165) f which constitutes 
an active center. It is possible that this proline residue is 
structurally critical and its substitution for threonine dis- 
torted the conformation of the main chain within the loop 
and caused loss of the enzyme activity. In the fourth 
inactive clone, MKC2, Arg-1135 was replaced by glycine, 
which was located on a loop between p strands A2 and 
B2 in the C-terminal domain and was positioned just 
behind the oxyanion hole of the active center (Kim et a/., 
1996; Love et ai, 1996). The side chain at position 1135 
may contribute to the processing activity by certain in- 
teractions with amino acids constituting the oxyanion- 
stabilizing loop." Since a glutamine residue could sub- 
stitute for Arg-1135 as shown in Fig. 5B, a positive charge 
at this position may not be very important for the inter- 
actions. The volume of the side chain might be critical in 
interactions such as hydrogen bonds, and the smallest 
side chain of glycine might lead to loss of such interac- 
tion causing the enzyme to be inactive in the clone 
MKC2. 

The trans-cleavage assay described here could be 
used for analyses to measure the activity of NS3 protein- 
ase clones and to identify quasispecies of the proteinase 
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sequences. The system would be also useful for reveal- 
ing the amino acids critical in inactive HCV clones. Ac- 
cumulation of such information should be helpful for 
understanding the relationship between the structure 
and the activity of the enzyme. 

MATERIALS AND METHODS 

Construction of substrate expression vectors 

The expression plasmid pMCPI, which encodes E. 
coii MBP and Staphylococcus aureus protein A in 
tandem, is a derivative of pMAL-c2 (New England 
BioLabs, Inc., MA) and pRIT2 (Pharmacia Biotech, Inc., 
Uppsala, Sweden). Plasmid pMCPI also contains mul- 
tiple cloning sites (H/ndlM to Xba\ site of pUC19) be- 
tween the MBP gene and the protein A gene. Sub- 
strate expression plasmids, pMCP-C2 and pMCP-AI, 
which encode fusion proteins containing appropriate 
aa sequences around the NS3 proteinase cleavage 
site between the MBP and protein A junctions (Fig. 
1A), were constructed by inserting phosphorylated 
linkers into the Hin6\\\-Xba\ site of pMCPI The syn- 
thetic oligonucleotide pairs used in these construc- 
tions are as follows: for plasmid pMCP-C2, 5'-AGCT- 
TGGCGACGACATCGTCTGCTGCTCAATGTCCTACT-3' 
and 5'-CTAGAGTAGGACATTGAGCAGCAGACGATGTC- 
GTCGCCA-3'; for plasmid pMCP-A1,5'-AGCTTGGCGA- 
CGACATCGTCTGCGCCTCAATGTCCTACT-3' and 5'-CTAG- 
AGTAGGACATTGAGGCGCAGACGATGTCGTCGCCA-3'. 

Cloning of NS3 proteinase genes 

The expression plasmids pWNH71 and pWB298 each 
contain a 13 N-terminal amino acid leader sequence 
derived from a nitrite hydratase gene, which are driven 
under the control of the trc promoter. This leader se- 
quence possesses a consecutive stretch of histidine 
(His) residues which allows the fusion protein to be 
purified with ease in a single step by metal chelating 
affinity chromatography 

HCV RNA molecules were purified from 50 fx\ of an 
HCV patient's serum (a gift from Dr. N. Hayashi) by using 
RNAzol B (Tel-Test, Inc., Texas) and subjected to cDNA 
synthesis, followed by the reverse transcription-poly- 
merase chain reaction (RT-PCR). To amplify the NS3 
region, the following oligonucleotides were used as 
nested PCR primers: forward primers, YH351 (5'-TTAA- 
GCnCTCGGTCCGCTCATGGTRCTCCARGC-3') and YH357 
(5'-TTAAGC]TGCGCCYATCACGGCCTAYTCCC-3'); back- 
ward primers, YH354 (5'-T AAGATCTA RGCRGCAACRGA- 
IGGRAGGACG-3') and KY390 (5'-TA GGATCC TARGCRG- 
CAACRGAIGGRAGGACG-3') (Underlines denote the re- 
striction sites used for subcloning, and nucleotides in 
bold type indicate the stop codon.) To amplify the cDNA 
fragment encoding aa residues 900 to 1260 in the HCV 
polyprotein, a primer set of YH351 and YH354 was used. 



To amplify the cDNA fragment encoding aa residues 
1027 to 1260 in the HCV polyprotein, a primer set of 
YH357 and KY390 was used. 

The H/ndlll-Bg/ll fragment and the Hind\\\-BamH\ 
fragment of the PCR products were ligated into the 
Hin6\\\~Bgl\\ site of pWNH71 and the Hind\\\-BamH\ 
site of pWB298, respectively. To screen enzymatically 
active NS3 proteinase clones, as described later, the 
ligated plasmid DNA and the substrate expression 
plasmid pMCP-C2 were co-introduced into E. co//DH5 
cells. Since the substrate and the enzyme expression 
vectors have different replication origins and different 
antibiotic-resistant markers, both plasmids can be 
maintained under selective pressure. Ampicillin and 
kanamycin double-resistant colonies were selected as 
the transformants harboring the two plasmids, and the 
plasmid DNAs were subjected to restriction endonu- 
clease analysis. 

Detection of NS3 proteinase activity in E. coii 
expression system 

Recombinant E. coii strains harboring the two plas- 
mids were precultured overnight in LB broth (Difco) con- 
taining ampicillin (50 /utg/liter) and kanamycin (30 ixg/ 
liter) at 30°C with shaking. The cultured cells were inoc- 
ulated at 1/50 dilution into the same broth and incubated 
further at 37 °C. When the OD 650 reached 0.8, isopro- 
pyl-b-D-thiogalactopiranoside (IPTG) was added at the 
final concentration of 0.5 mM, and the culture was further 
incubated for 4 h. After the cultured cells were harvested 
and suspended in 0.85% NaCl, a portion of the suspen- 
sion was mixed with Laemmli's sample buffer and ana- 
lyzed by sodium dodecyl sulfate-polyacrylamide gel 
electrophoresis (SDS-PAGE). 

SDS-PAGE was carried out in a 14% acrylamide slab 
gel and the gel was stained with Coomassie brilliant blue 
(CBB). In addition, Western blot analysis was carried out 
after SDS-PAGE. The processed, products from the C- 
terminal region of the substrate protein, containing pro- 
tein A-fused polypeptide, was detected with horseradish 
peroxidase (HRP)-conjugated goat IgG anti-rabbit IgG. To 
confirm the expression of the NS3 protein in E. coii cells, 
rabbit anti-NS3 IgG prepared from a rabbit immunized 
with purified recombinant protein (aa 900 to 1260; ex- 
pressed in E. coii) was used as the first antibody and 
HRP-conjugated goat IgG anti-rabbit IgG as the second. 

Mutagenesis 

Site-specific mutagenesis was accomplished by using 
a PCR-based method (Ho et al. 1989). All constructs 
were confirmed by sequencing analysis. The PCR prim- 
ers used were as follows: Vector primers; forward primer. 
5'-TGTGAGCGGATAACAATTTCGGATC-3\ and backward 
primer, 5'-TCGGCCGCCCGACTATCACCGCCC-3'; muta- 
genesis primers; for MKC2 (G1135->R): 5'-CTTTACCTG- 
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GTCACGAGACATGCTGATGTC-3' and 5'-GACATCAGCA- 
TGTCTCGTGACCAGGTAAAG-3'; for N-A1 (amber 
W1079; 1 G/C bp insertion): 5'-AACGGCGTGTGTTGGAC- 
TGTCTACCATGG-3' and 5'-ACCATGGTAGACAGTCCAA- 
CACACGCCGTT-3'; for N-A3 (T1168 -* P): 5'-GCTCTTC- 
GGGTGGTCCGCTGCTTTGCC-3' and 5'-GGCAAAGCAG- 
CGGACCACCCGAAGAGC-3'; for D51 (R1135 -> G): 5'- 
CTTTACTTGGTCACGGGACATGCTGATGTC-3' and 5'- 
GACATCAGCATGTCCCGTGACCAAGTAAAG-3'; for D51 
(R1135 -> K): 5'-CTTTACTTGGTCACGAAACATGCTGAT- 
GTC-3' and 5'-GACATCAGCATGTTTCGTGACCAAGTA- 
AAG-3'; for D51 (R1135 -» Q): 5'-CTTTACCTGGTCACGC- 
AACATGCTGATGTC-3' and 5'-GACATCAGCATGTTGCG- 
TGACCAGGTAAAG-3'; for D51 (S1 165 -» A): 5'-TGAAG- 
GGTTCCTGCGGTGGTCC-3' and 5' GGACCACCGCAGG- 
AACCCTTCA-3', forward and backward primers, respec- 
tively. 

Construction of deletion mutants 

All deletion mutants were constructed by the PCR. The 
H/ndlll-Bg/ll fragment and the Hind\\\-BamH\ fragment 
of PCR products were ligated into the H/ndlll-Bg/ll site of 
pWNH71. All constructs were confirmed by sequencing 
analysis. 
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Summary 

During replication of hepatitis C virus (HCV), the final 
steps of polyprotein processing are performed by a 
viral proteinase located in the N-terminal one-third of 
nonstructural protein 3. The structure of NS3 protein- 
ase from HCV BK strain was determined by X-ray crys- 
tallography at 2.4 A resolution. NS3P folds as a trypsin- 
like proteinase with two p barrels and a catalytic triad 
of His-57, Asp-81, Ser-139. the structure has a sub- 
strate-binding site consistent with the cleavage speci- 
ficity of the enzyme. Novel features include a structural 
zinc- binding site and a long N-terminus that interacts 
with neighboring molecules by binding to a hydropho- 
bic surface patch. 

Introduction 

Hepatitis C virus (HCV) is the major etiologic agent of 
human parenteralty and community-acquired non-A, 
non-B hepatitis (Choo et a), 1 989). Chronic HCV infection 
is a global disease, and the number of carriers is esti- 
mated to be about 300 million. Chronic infection may 
lead to the development of chronic hepatitis, liver cirrho- 
sis, and hepatocellular carcinoma (reviewed by Hough- 
ton, 1996). In Europe and Japan, the disease is more 
prevalent than either hepatitis B virus or human immuno- 
deficiency virus infections. Protective vaccination is not 
available for HCV, and current treatments with interferon 
are successful only in a limited number of patients. 
Therefore, considerable attention has been focused in 
recent years on understanding HCV replication and ob- 
taining structural information about essential HCV pro- 
teins. 

The HCV virion has a positive-strand RNA genome that 
was cloned in 1 989 (Choo et al., 1 989). It is composed of 
about 9,400 nucleotides and contains a single large 
open reading frame encoding a polyprotein of 3010- 
3033 amino acid residues (Kato et al., 1990; Choo et aJ., 
1 991 ; Takamizawa et al., 1 991). The genetic organization 
of HCV (reviewed by van Doom, 1994) is similar to that 



of flavi- and pestiviruses, and it was classified as a 
separate genus of the family Flaviviridae. Sequence 
analysis of HCV isolates reveals that HCV exists in many 
distinct variants. A total of six major genotypes and at 
least 11 subtypes have been recognized (Simmonds et 
al., 1994). 

The nonstructural (NS) proteins involved in replication 
of the HCV genome are released by the action of two 
proteinases: NS2-3 and NS3. NS2-3 proteinase is a zinc- 
dependent enzymatic activity that performs a single pro- 
teolytic cut to release the N-terminus of NS3 (Grakoui 
et al., 1993, Hijikata et al., 1993). The action of NS3 
proteinase (NS3P), which resides in the N-terminal one- 
third of the NS3 protein, then yields all remaining non- 
structural proteins: NS4A, NS4B, NS5A, and NS5B. The 
C- terminal two-thirds of the NS3 protein contain a heli- 
case. While the functional relationship of these two do- 
mains is unknown, the separately expressed proteinase 
and helicase domains of NS3 exhibit their respective 
activities in vitro (Suzich et al., 1993; D'Souza et al., 
1995; Steinkuhler et al. 1996). 

The N-terminal domain of NS3 has been found to 
contain the catalytic motif of a trypsin-like serine pro- 
teinase (Miller and Purcell, 1 990). The positions of amino 
acids His-57, Asp-81 , and Ser-1 39 (numbering from the 
start of NS3) are strictly conserved among all HCV- 
derived sequences; their relative order and spacing in 
the sequence correspond to the catalytic triad of the 
trypsin family. However, the NS3P exhibits several other 
features that are highly unusual for a trypsin-like protein- 
ase: it is covalently attached to a helicase possessing 
NTPase activity, it requires a protein cofactor (NS4A), 
and displays sensitivity to divalent metal ions. Using in 
vitro transcription/translation systems, the NS4A protein 
(located immediately downstream of NS3 on the pol- 
yprotein) was shown to be required for cleavage at the 
NS3/NS4A, NS4A/NS4B, and NS4B/NS5A sites. The 
NS3/NS4A cleavage occurs rapidly and in c/s (intramo- 
lecular event), while the others occur in trans (intermo- 
lecular events). NS4A also accelerates the rate of cleav- 
age at the NS5A-NS5B junction (Bartenschlager et al., 
1994; Failla et al., 1994; Lin and Rice, 1995; Koch et al., 
1996). While NS4A may have several functions, it has 
been suggested that NS3 and NS4A proteins form a 
complex that is important for modulation of proteolytic 
activity (Hijikata et al., 1993; Lin et al., 1994). 

The proper understanding of these unique features of 
the NS3P as a molecular target can be achieved only 
in the context of a high resolution atomic structure, 
which in turn would also accelerate design and develop- 
ment of new drugs to treat HCV infection. However, 
crystallization of NS3P has proven difficult owing to its 
poor solubility and tendency to aggregate. To circum- 
vent this problem, we examined a series of NS3P do- 
mains of variable sizes, derived from different HCV 
strains. Here, we report the three-dimensional structure 
of the NS3P domain from the strain BK, which belongs to 
the HCV genotype 2b. HCV BK NS3P shows significant 
amino acid sequence identity with representative HCV 
genotypes: 89% with HCV H (genotype 1a), 96% with 
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HCV J (1b), 71% with HCV J6 (2a), 71% with HCV J8 
(2b), and 78% with HCV 3A (3a). It shares 33% sequence 
identity with the NS3 proteinase from the recently dis- 
covered hepatitis G virus (Linnen et al, 1996). Crystals 
of HCV BK NS3P were obtained using a 189 amino acid 
recombinant fragment purified from Escherichia coli. 
This structure represents the first view of a processing 
enzyme from the flavivirus family and opens new possi- 
bilities for design of drugs targeting HCV replication. 

Results and Discussion 

Overall Structural Features 

HCV NS3 proteinase (NS3P) is folded into two six- 
stranded 0 barrels, similar to those of trypsin-Iike serine 
proteinases (Figures 1A and 2A-2B). However, apart 
from the three-catalytic triad residues (His-57, Asp-81 , 
Ser-139) of NS3P and the sequence of GXSGG at Ser- 
139, it shares virtually no sequence similarity with pro- 
teinases possessing a trypsin-Iike fold (Figure 1 B). In 
our discussions, features of NS3P are described while 
it is compared with other classes of trypsin-Iike protein- 
ases. These classes include cellular enzymes such as 
pancreatic proteinases (e.g., elastase; Meyer et al., 
1 988) and bacterial proteinases (e.g., a-lytic proteinase; 
Fujinaga et al., 1985), all having serine as the active-site 
nucleophile. Also included are viral proteinases with a 
nucleophile of either serine (e.g., Sindbis virus core pro- 
tein; Choi et al., 1991) or cysteine (e.g., rhinoviral 3C 
proteinase; Matthews et al., 1 994). These examples were 
chosen because they contain an uncharged S1 specific- 
ity pocket, which for elastase and a-lytic enzymes is 
also relatively small. NS3P is expected to have a small 
uncharged specificity site because the consensus 
amino acid at substrate P1 is either cysteine or threonine 
(Grakoui et al., 1993). 

The number of amino acids at the NS3P N-terminus, 
before the first 0 barrel, is significantly larger than in 
most proteinases in their active states. These initial 30 
residues extend away from the protein and form several 
0 strands that interact with neighboring molecules (Fig- 
ure 2). The possible significance of this interaction is 
discussed later. The existence of secondary structure 
in the N-terminus is reminiscent of the short 0 strand 
found in many cellular proteinases (e.g., residues 19-22 
of elastase) but differs from the case of the a helix found 
in the picomaviral 3C proteinases. 

NS3P displays a spatial arrangement of strands within 
its 0 barrels similar to other trypsin-Iike proteinases; 
however, the loops connecting these strands are rela- 
tively short (Figures 2 and 3). In this regard, NS3P paral- 
lels the economical use of amino acids found in Sindbis 
core protein (Rgure 3B), or possibly even a viral 2A 
proteinase (the presumed trypsin-Iike cysteine protein- 
ase upstream of 3C in ptcomaviruses and enteroviruses, 
but with unknown structure). The total number of resi- 
dues comprising the two 0 barrel motifs (from the first 
0 barrel strand to the end of the last 0 barrel strand) is 
about 140 for NS3P, only slightly larger than 133 in 
Sindbis, and similar to the predicted number of about 
1 40 for 2A proteinases (Bazan and Fletterick, 1 988); this 
number exceeds 170 residues in the cellular protein- 
ases. As a consequence of this economy, several loops 



common in the cellular proteinases are absent in NS3P, 
such as the calcium-binding loop (0D1 to 0E1), the autol- 
ysis loop (0A2 to 0B2), and the so-called methionine 
loop (0B2 to 0C2). In the bacterial and picomaviral pro- 
teinases, this latter loop (but in pancreatic proteinases, 
the 0E1 to 0F1 loop) is positioned such that it can inter- 
act with residues on the P side of the substrate (Figure 
3B); the absence of a corresponding loop in NS3P is 
consistent with its apparent lack of substrate recogni- 
tion over P2 to P5 (Bartenschlager et al., 1995; Grakoui 
et al., 1993). 

After the final strand 0F2, there is one turn of a helix 
(171-174) that closely matches the first turn of the 
C-terminal helix found in cellular proteinases. Following 
this turn, the polypeptide is disordered in two of three 
monomers in our asymmetric unit. In the monomer dis- 
cussed here, the chain turns back toward 0E1 to form 
a short antiparallel 0-interaction between 0EO (1 81-1 82) 
and 0E1 b (75-76; see Figure 2); this interaction may be 
the result of crystal packing only. Questions of flexibility 
and conformation at the C-terminus of NS3P are relevant 
because this crystal structure represents an enzyme 
which in vivo is permanently attached (via its C-terminus) 
to a helicase, the two entities together defining the NS3 
protein. The exact range of residues within NS3 that 
corresponds to the folded helicase is unknown, as is 
the length of the polypeptide linker between NS3P and 
the helicase. 



Zinc-Binding Site 

In cell-free transcription/translation experiments, pro- 
cessing of the HCV IMS polyprotein by NS3P is stimulated 
by the addition of Zn 2+ . Inductive coupled-plasma mass 
spectroscopy of purified recombinant NS3P reveals an 
equimolar ratio of NS3P and zinc (Z. H., unpublished 
data). Thus, the existence of a functionally relevant zinc 
in HCV NS3P was expected. From our NS3P homology 
models, the close proximity of conserved cysteines 97, 
99, 145, and histidine 149 suggested a zinc-chelation 
site, because these residues are frequently members of 
structural zinc sites (Vallee and Auld, 1990; Schwabe 
and Klug, 1994), and because disulfide linkages are un- 
likely in an intracellular proteinase. The crystal structure 
confirms this idea; these three cysteines together pro- 
vide a partial tetrahedral geometry around the zinc ion 
(see Figure 2B), with cysteine sulfur to zinc distances 
of 2.0-2.5 A. In two of three monomers in the asymmetric 
unit of the crystal, the fourth member of the tetrahedral 
coordination of the zinc is a water molecule; this water 
is within hydrogen-bonding distance of the His-1 49 side 
chain. In the third monomer, His-1 49-NS is 4.0 A away 
from the zinc and thus does not play a direct chelation 
role, but the imidazole isolates the zinc from solution 
and is positioned to coordinate the metal readily (Rgure 
2B). Point mutations of Cys-97, -99, -145, and His-1 49 
(to alanine) show that removal of any one has a negative 
impact on NS3P processing (Hijikata et al, 1 993; Stemp- 
niak et al., submitted). Thus, His-1 49 may be an integral 
part of zinc coordination at least during the initial folding 
of NS3P. 

The zinc site serves to anchor the turn at 0D2-0E2 
(containing Cys-1 45 and His-1 49) to the interbarrel loop 



HCV NS3 Proteinase Structure 
333 




B 



HCVBK 
HRV14 

svpc 

ALP 
SLA 



AO 



BO CO 
- - > -»-••■-> 

10 20 



DO 

--> 
30 



Al 



Bl 



CI 



Dl 



Bl 



40 
I 



50 



SO 



70 
I 



III I 1*1 

PTTAYSO OTRGTjTvG CIITSL TGRDKNOVDG EVOWST AT QSFLATC VN GYQCEVYHGAG- SKTL&fiP Kfi£lT-QMOTVD 

GPNTEFALSLLRKNXtGlUX SKG£EEfiLGIH DRYQLLPTHAQQ- GDJ2£L^NG CjSJJtttKIHlX&LVDPENIN - - 

RLH2ZKNEDGD- - -VTG HATAM EGi^UKPLHVKG - - ~TIDH LKFUSS 

ANTVGGT FVSTM N AS T JSVG FSVT RGAT KG FVTAGHOGTVNATARIGG AWGT-FAARVFP 

W GGTFA ORN SWP fiOTSI.QYR SGRSWAHT nGGTLTRO NHYH1AAHCVD- -LTFRWV (12} QYVGVOKTWH P YWhTT DDVA 

t II It It 



20 



30 



40 



50 



CO 



81 



90 



ECVBX 
HRV14 
SVPC 
ALP 
SLA 



PI 



A2 



B2 



C2 



100 



> <X&t«r 

60 90 

I * I I 

- -O DLVGWO AP PGSRSLT PCTCGS - - 

--L ELTVLTLD RNEKFRDI RG P I SEDL BG 

SAY DMEFAQ LP VNMRSEAFTYTS EH PE 



110 120 130 

I I > 

S DLYLVT RH ADVIPVRRRG DSBG^LSPRPVSYL 

VDATLWHS WN FTWTILEVG PVTMAGLIH-LSSTFT NRMIRYDY ATK 

G FYNWH - HGAXfiXS- - - GGGflEdPRG- - -VGG 

-GND RAWVSLT -SAOTLLPRVAKGSSFVTVRGSTEAAVG- AAVCRSG RT TGYOCGTITAKNVTANYAEGA-VRGLTQGNA CMG 

AGY DTAI.I.RLA - - QSVTLNSYVQLGVL PRAGT I LANNS P - S P£YJJ£WGLT RTNGQLAQTLQQAJLPTVDYAI CS S ( 5 ) STVK NSMVCAGG DGVR- - SGC 



I 

100 



I 

110 



120 



130 



t 

140 



I 

150 



I 

160 



170 



180 



I 

190 



D2 



82 



F2 



150 



140 

♦I 

HCVBK KGSSGG PLLC PS-G- 
EKV14 TGQCG GVLCAT — G- 

SVPC RGDSG RPIMDN S-G- 
ALP RGDS GGSWIT SA-G- 
ILA QGDSGGPLHCLVNGO YAVHGVTSFW SR 

I I 



170 1B0 
I 1 
- RGVA KAVDFV PVESMETTMRS PVFT DNSS 

QGFSAQ LK KQYFVEKQ 

--GT RTALSWTWN SKGKTIKTTPEGTEEW 



160 

t I 
HAVGTFRAAVCT 

KIFCIHyGGNGR 

RWAIVLGGADE 

OAOGVMSGGNV QSKGNNCG I PASORS SLFERL O PILSQYGLSLVTG 
--LGCWVTRK PTVFTR VSAYISWINNVIASN 
I I I 



200 



210 



220 



230 



240 



Figure 1 . Trace of HCV NS3P Structure and Its Sequence Alignment with Other Proteases 

(A) Side-by-side stereo view of the Ca trace for HCV NS3P (residues 2-184), drawn using Xfrt (McRee, 1992). This is monomer 1 of the 
asymmetric unit. The trace is labeled at intervals for clarity. The N -terminal 30 residues {before the first 0 barrel) are extended away from the 
core of the protein. 

(B) Alignment of HCV BK NS3P sequence with other proteinases based on superposition of crystal structures (as in Figure 3B). Residues of 
0 strands are underlined. The genera) location of each fj strand among ail sequences is indicated by a dashed arrow; labeling of these strands 
follows the common convention, except for the extra N-terminal strands of HCV, which are labeled AO-DO. Catalytic triad residues are marked 
by an asterisk. Excess residues not shown are indicated in brackets. Abbreviations for proteins are: SVCP, Sindbis virus core protein; ALP, 
a-lytic proteinase; ELA, porcine eiastase; HRV14, human rhinovirus type 14 3C proteinase; HCVBK, hepatitis C virus, BK strain NS3 proteinase. 
Amino acid numbering is for either HCVBK (top) or ELA (bottom). 
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C D 

Figure 2. Overall Folding of HCV NS3P and Assembly of Molecules in the Crystal, Emphasizing N-Termina! Configuration and Exchange 
Displayed by Ribbons (Carson, 1991). 

(A) View into the active site of monomer 1 in the asymmetric unit with catalytic triad residues labeled by the single amino acid code. Secondary 
structural elements are color-coded yellow for 0 strand, green for a helices, and blue for coil, 0 strands in the trypsin-like barrels (A1-F2) are 
labelled in the standard convention. Catalytic triad residues are shown in sphere and cylinder representation, with spheres color-coded green 
for carbon, blue for nitrogen, and red for oxygen. 

(B) Trace of monomer 1 color-coded as in (A), rotated relative to (A) for viewing of the structural zinc-binding site. The zinc (gray sphere) is 
tetrahedrally coordinated by the sulfurs (yellow spheres) of cysteines 97, 99, and 145. Histidine 149, though not within bonding distance, is 
seen poised to complete the tetrahedral coordination (see text). 

(C) Architecture of the N -terminal strand exchange. The N-terminal residues of monomer 1 (in green) extend away from the molecule permitting 
two anti parallel fj strands, AO of monomer V, in blue (generated by the crystallographic 3-fold from monomer 1), and CO of monomer 3 fin 
yellow), to lie on its surface. This interaction involves nonpolar side chains from the blue and yellow $ strands being buried against a 
hydrophobic patch on the surface of the green molecule (see text). A short parallel 0 sheet interaction is also seen, involving DO and BO 
strands of monomers 1 and 3, respectively. 

(D) Hexamer generated from monomers 1 and 3 using the crystallographic (3-fold) and noncrystallographic (2-fo!d) symmetry elements. In 
this view, the crystallographic 3-fold axis is vertical. The crystallographic trimer of monomer 1 (blue, red, and yellow) associates with the 
crystallographic trimer of monomer 3 (purple, light blue, and green) using N -terminal strand exchange (see text). Each molecule within the 
hexamer accepts two antiparallel 0 strands from neighboring molecules; e.g., the green molecule accepts the purple and yellow strands. The 
noncrystallographic 2-fold symmetry mates are red and green, yellow and purple, and blue and fight blue. 



(containing Cys-97 and Cys-99). Such a location for zinc, 
remote from the active site, implies a structural rather 
than a catalytic role for the metal; its effects on polyprot- 
ein processing are probably linked to accurate NS3P 
folding or post-folding stability of the enzyme or both. 



However, perturbations at the zinc site could conceiv- 
ably affect the active-site conformation, because the 
two are linked directly through strands pD2 and (SE2. 

During our derivative searches (see Experimental Pro- 
cedures), we found that many metals and heavy atom 
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Figure 3. Active Site of NS3P and Comparison with Other Trypsin-Hke Serine Proteinases 

(A) Side-by-side stereo view of the active site and surroundings of HCV NS3P. A Ca trace is given for clarity, with purple coloring the oxyanion 
loop, green for the short a helix preceeding the active-site serine, and red for the section of pE2 that interacts with substrate. Side chains, 
whose atoms are colored yellow (carbon), blue (nitrogen), or red (oxygen), are labeled by the single-letter amino acid code. Shown here is 
the active-site triad (H57, D81, S139) and residues defining the St specificity pocket (L135, F154, A157). Overall, this S1 pocket is relatively 
small and n on polar. 

(B) Superposition of HCV NS3P (green) and Sindbis core protein SCP (purple). This superposition is based on structural alignment of 0 structure 
within the barrels, which gives a root-mean-square difference on Co' s of 1 .5 A. The catalytic triad residues are shown for each molecule 
(thicker fines) to illustrate their similar spatial arrangement, as w'ell as the torsional deviation of Asp-81 in IMS3P. The 0E2-0C2 loop (red) and 
0E1-0F1 loop (yellow), often determinants of P2-P5 substrate specificity in pancreatic/bacterial proteinases, are unusually short in NS3P as 
well as SCP. The extended N-terminus of NS3P has been omitted for clarity. 
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compounds bind at this zinc site in the crystal. For the 
case of mercury binding, we observed a disruption of 
the structure in this region, essentially through a disor- 
dering of the polypeptide around Cys-97-99. Other ex- 
periments showed that the zinc ion is lost when mercury 
binds to NS3P (presumably at cysteine) and that this 
binding inhibits the enzyme (Z. Hostomska, unpublished 
data). Copper has also been found as a potent inhibitor 
of NS3P (Han et at., 1 995). Copper binding in our crystal 
occurs primarily at the zinc site, but substitution of zinc 
with copper is not yet confirmed. In the cases above, 
inhibition may result from an altered active site that in 
turn arises from destabilization of the zinc site during 
metal binding. 

The zinc-coordinating amino acids of NS3P are not 
present in other members of Flavtviridae family such as 
yellow fever virus or bovine diarrhea virus but are found 
in the recently discovered GB viruses (Simons et al., 
1995) and hepatitis G virus (Linnen et al., 1996), which 
are more closely related to HCV. The zinc site of NS3P 
may play a role analogous to the disulfide common 
among cellular proteinases (1 36-201 in elastase), which 
also anchors the £D2-0E2 turn to the interbarrel loop. 
In picomaviral 3C proteinases, which possess neither 
disulfides nor a structural zinc, stability in this region 
may be provided by the N-terminal a helix, which packs 
against the |3D2-pE2 turn and connecting loop simulta- 
neously. 

Alignment of picomaviral 2A proteinase sequences 
(Yu and Lloyd, 1992; Bazan and Fletterick, 1988) has 
revealed two conserved motifs, CXC within the interbar- 
rel loop and CXH at the end of £D2, which are similar 
in residue type and sequence position to the chelation 
motifs in NS3P (CXC and CXXXH). Several biochemical 
and biophysical studies of 2A proteinases have shown 
that zinc is an integral tightly bound component of the 
structures, required for correct folding and stability (but 
not involved in catalysis) and probably chelated by resi- 
dues of the two motifs (Yu and Lloyd, 1 992; Sommer- 
gruber et al., 1994; Voss etal., 1995). Thus, in the picor- 
naviral 2A proteinases there is probably a zinc site with 
general location, coordination geometry, and stabilizing 
function that parallels the NS3P case. 

Active Site 

In the crystal structure of NS3P, the catalytic triad resi- 
dues (His-57, Asp-81, Ser-139), the "oxyanion-stabiliz- 
ing loop" (135-139), and strand 0E2 forming one side 
of the specificity pocket, together have the same relative 
spatial positions as in other trypsin-like proteinases (Fig- 
ure 3). Mutation experiments have identified the pre- 
sumed NS3P triad residues as essential to proteolytic 
activity (Hijikata et al., 1993). 

The imidazole of His-57, expected to extract a proton 
from nucleophile Ser-1 39 during the enzymatic reaction, 
is oriented toward Ser-1 39 but is not close enough to 
form the hydrogen bond often observed in proteinase 
structures. The stretch of residues from 57 to 63 has 
greater mobility than most loops in the structure, as 
indicated by higher temperature factors; however, the 
conformation we have built here is consistent with our 
experimental density maps and resembles the helical 



turn found in other proteinases. Flexibility in this region 
may be related to a lack of structural anchoring com- 
pared with cellular proteinases, where a conserved di- 
sulfide links Cys-58 to Cys-42 on strand pB1. 

The side chain of Asp-81 1 expected to provide charge 
stabilization for His-57 after deprotonation of Ser-139, 
is oriented away from His-57 in the crystal structure of 
NS3P. However, the Ser-1 39-Ca to Asp-81 -Ca distance 
of 11.4 A is close to that in rhino virus 3C proteinase 
(11.4 A) and also Sindbis core protein (10.9 A), while 
only slightly longer than that in cellular proteinases (ap- 
proximately 10 A). Asp-81 forms an ion pair with Arg- 
1 55 (on (JE2), the latter being a conserved amino acid 
among HCV sequences. Arg-155 in NS3P corresponds 
to conserved Ser-214 of other proteinases, where the 
hydroxyl of the serine hydrogen-bonds to the carboxyl- 
ate of triad member Asp-1 02. Interestingly, the rotation 
of Asp-81 away from His-57 observed in NS3P, and the 
Asp-81 -Arg-155 interaction, are paralleled in the crystal 
structure of 3C proteinase from hepatitis A (Allaire et 
al., 1 994), where the aspartate points away from histidine 
while interacting with a lysine on strand 0F2. In the 
hepatitis A case, however, it was suggested that the 
aspartic acid is not crucial for histidine charge stabiliza- 
tion because the cysteine nucleophile is easier to depro- 
tonate than serine. In NS3P, only minor readjustments 
of amino acids 80-82 would position the Asp-81 car- 
boxylate closer to the His-57 side chain, as probably 
required for catalysis. We propose that a classic cata- 
lytic triad configuration for His-57/Asp-81 /Ser-1 39 will 
exist during substrate cleavage by NS3P in vivo and 
in vitro and that the positional deviation of Asp-81 we 
observe is a consequence of an apo-enzyme structure 
or the crystallization conditions or both. 

The sequence around Ser-1 39 in NS3P (Gly-Ser-Ser- 
Gly-Gly) follows the conserved GXSGG motif seen in 
trypsin-like serine (and cysteine) proteinases, and the 
polypeptide backbone conformation of the oxyanion- 
stabilizing loop (135-139) is nearly identical to that in 
other proteinases (see Figures 3B and 6). Alignment of 
all backbone atoms of NS3P residues 1 35-1 39 with the 
corresponding atoms of elastase (1 91 -1 95) gives a root- 
mean-square difference of 0.71 A. The 0C2-0D2 loop, 
always variable among proteinases, is unique in NS3P 
because it contains one turn of a helix (residues 131- 
134) just prior to the oxyanion loop. This helix could 
add conformational stability at the binding site, perhaps 
analogous to the conserved disulfide found in many 
proteinases (involving Cys-1 91 ) that anchors the oxya- 
nion loop to the pE2-pF2 turn. 

The cleavage performed by NS3P is between poly- 
protein substrate residues Cys/Ser (at 4B/5A and 5A/ 
5B sites), Cys/Ala (at 4A/4B), or Thr/Ser (at 3/4A). Our 
structure, shows a specificity pocket that is shallow and 
nonpolar, formed primarily by the side chainsof invariant 
residues Phe-1 54, Ala-1 57, and Leu-1 35 (Figure 3A). Ho- 
mology modeling had predicted this type of S1 pocket, 
involving at least Phe-1 54 (Pizzi et al., 1994; see also 
"Crystallographic Phasing Strategies," below). Phe-1 54 
corresponds to position 21 3 of the cellular proteinases, 
where it is typically a small hydrophobic amino acid. 
Picornavirus 3C proteinases have histidine at 21 3; thus, 
they show a closer steric resemblance to NS3P at this 
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Figure 4. Model of a Bound Polypeptide Substrate Containing the NS3/4A Junction 

Side-by-side stereo view of the NS3/4A substrate (color-coded: green, C; red, O; blue, N) modeled into the active site of HCV NS3P, which 
is covered by a molecular surface in white dots (generated by the MS program [Connolly, 1 983] using a 1.6 A probe radius). The Ca trace of 
NS3P is shown in yeflow, with the oxyanton-stabilizing loop in purple and the extended strand 0E2 in red. NS3P side chains (color-coded: 
yellow, C; red, O; blue, N) are shown for residues comprising the S1 specificity pocket (L1 35, F1 54, A1 57), the potential P6 recognition elements 
(R161, K165), and active-site serine (S139). The substrate is labeled according to the standard P/P' convention, while the HCV residues are 
labeled in the single-letter amino acid code. 



position. The Leu-135 side chain forms the bulk of the 
remaining S1 pocket and is located approximately at 
position 190 of cellular proteinases, often a small polar 
amino acid and specificity determinant. Ala-1 57 and Val- 
1 67 of NS3P correspond to positions 21 6 and 226, re- 
spectively, of cellular proteinases, which are usually gly- 
cine to accommodate long P1 side chains; elastase, 
which recognizes a small nonpolar P1 side chain and has 
Val-216/Thr-226, most closely parallels NS3P at these 
positions. 

Substrate Modeling in NS3P 

Hydrolysis of the NS3/4A peptide bond is the first (in 
cis) cleavage event by NS3P, involving a threonine at the 
P1 position of the substrate; for subsequent cleavage 
events, the P1 residue is cysteine. A model of the Mi- 
chaelis complex between a polypeptide substrate con- 
taining the NS3/4A cleavage site and NS3P was con- 
structed (Figure 4), based on the known binding mode 



of polypeptide inhibitors of trypsin-like enzymes (Read 
and James, 1986). This model suggests the possibility 
of a favorable interaction between the OH (or SH) of the 
P1 side chain and the electron-rich it clouds on the 
aromatic ring of nearby Phe-154. A sulfhydryf-aromatic 
interaction was earlier proposed as a possible NS3P 
substrate recognition mechanism (Pizzi et al. f 1994). In- 
teraction between a threonine hydroxyl and a tyrosine 
aromatic ring has been observed (Ji et al., 1992; Liu 
et al., 1993). Numerous observations and calculations 
support the notion that hydrogen bonding between pro- 
ton donors and aromatic rings plays a significant role 
in protein stability (Burtey and Petsko, 1986; Levitt and 
Perutz, 1988). 

Strand 0E2 of NS3P, expected to form 0 sheet antipar- 
allel hydrogen bonds with the substrate over P2-P3, is 
more extended than in most proteinases. Its length most 
closely resembles the corresponding 0 strand in Sindbis 
core protein. We hypothesize that the NS3P enzyme- 



Cell 
338 




Figure 5. Solvent-Accessible Surface of 
NS3P Monomer 1, Colored by Hydropho- 
bicity, Such That Non polar Residues Are 
White, Charged Residues Are Deep Magenta, 
and Polar Residues Are Medium Shades of 
Magenta 

The central white region is the hydrophobic 
patch (approximately 400 A 2 ) discussed in the 
text. In the crystal, there are two N-terminal 
strands from neighboring molecules bound 
to this patch, namely pCO from monomer 3 
(blue ribbon) and (JAO from monomer 1 ' (green 
ribbon). Strand pCO buries Cys-1 6 and lle-1 8, 
while strand (SAO buries lle-3 and Ala-5, into 
the center of the hydrophobic surface patch. 
Residues Ser-20 of 0CO and Ser-7 of £A0 lie 
at the edge of the interface and hydrogen- 
bond to monomer 1 . Also shown in yellow (at 
bottom horizon of surface) is the P3' residue 
of a modeled substrate polypeptide (see Fig- 
ure 4), included here as a directional refer- 
ence to the active site. 



substrate R-interaction at P2-P3 continues for another 
three residues to P6 (Figure 4), for two reasons. First, the 
consensus acidic residue at substrate P6 could interact 
with the Arg-161 and Lys-165 of NS3P, which are rela- 
tively isolated on the extended 0E2-j3F2 turn. Secondly, 
a continuous P2-P6 main-chain p-tnteraction might 
compensate for the apparent lack of P2-P5 side chain 
to enzyme interactions, which results from the relative 
shortness of the RB2-RC2 and PE1-0F1 loops in NS3P. 

N-Terrninal Configuration 

A quite unexpected feature of the crystal structure of 
NS3P is the configuration of its long N-terminus (the 
first 30 amino acids), which extends away from the pro- 
tein and contains p strands that interact with neigh- 
boring molecules (see Figure 2). Monomers 1 and 3 of 
the asymmetric unit ultimately assemble into a hexamer 
via a combination of crystallographic symmetry (a trimer 
is generated from either monomer) and noncrystallo- 
graphic symmetry (association of these two trimers). The 
order of this assembly is unknown. A similar hexamer is 
formed from six copies of monomer 2 by (32) crystallo- 
graphic symmetry operators. However, weaker electron 
density at the N-termini in this hexamer precludes the 
building of polypeptide chains between copies of mono- 
mer 2. 

One consequence of the strand exchange is that a 
hydrophobic patch on the surface of each molecule is 
covered by two relatively nonpolar 0 strands from 
N-termini of a different neighbor (Figures 2C and 5). 
Specifically, monomer 3 contributes 0CO (residues 
16-20; sequence CUTS) to the patch on monomer 1, 
with Cys-1 6 and lle-1 8 buried and Ser-20 partially buried 
at the interface boundary. Antiparaltel to this strand, 



and also lying against the patch, is (JAO (residues 2-7; 
sequence PITAYS) from a crystatlographically related 
copy of monomer 1, which buries lle-3 and Ala-5 into 
the patch while partially burying Ser-7. An additional 
association, apart from the patch, involves a short two- 
stranded parallel p-interaction (fJBO: fJDO between mo- 
nomers 1 and 3) that occurs in between trimers, where 
two traversing N-termini approach one another (Fig- 
ure 2C). 

The "hydrophobic patch" is a long narrow (approxi- 
mately 20 A x 8 A) shallow valley with a continuous 
nonpolar molecular surface area of about 400 A 2 , located 
mainly on the second 0 barrel, and composed of resi- 
dues primarily from strands 0A2, pB2, and 0D2 but also 
two residues from 0A1 (Figure 5). Specifically, the patch 
is formed by the side chains of conserved residues Val- 
33, Val-35, Leu-44, Leu-94, Val-107, Leu-127, Ala-111, 
Val-113, Pro-115, Pro-142, Leu-1 44, Tyr-1 05 (ring por- 
tion), and Arg-1 09 (alkyl chain portion). 

Since hydrophobic interactions are often key compo- 
nents of molecular recognition, the association of the 
two p strands with the hydrophobic patch on NS3P may 
mimic one or more molecular interactions that occur 
during HCV poly protein processing, rather than being 
an artifact of crystallization. The patch could qualify as 
a functionally important binding site because it is de- 
fined by residues from strand 0D2, which contains the 
nucleophite, and by the antiparallel strands 0A2/0B2, 
which together directly support and hydrogen-bond to 
the oxyanion loop. It is known that the NS4A domain 
(immediately downstream of NS3) binds to NS3P during 
processing, and this facilitates the cleavages by NS3P. 
Furthermore, the N-terminus of NS3P itself participates 
in the cofactor modulation of NS4A (Koch et a!., 1996). 
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Figure 6. Section of the 2.7 A Experimental Electron Density Map 

Side-by-side stereo view of NS3P (color-coded: yellow, C; red, O; blue, N) in the vicinity of the active site, superimposed on the 2.7 A 
experimental ISIRAS map {purple mesh), contoured at 1 .5 u. A clear definition of main chain, side chain, and, frequently, carbonyl direction 
exists throughout the majority of the map. 



Therefore, one possibility is that this hydrophobic patch 
forms part of a site through which the N-terminus of 
NS3P or the cofactor NS4A or both modulate the activity 
of NS3P. 

Experimental Procedures 

Protein Expression and Purification 

HCV BK NS3 proteinase (residues 1-189) was expressed in E. coli 
as a soluble protein. The isolation of a pure protein involved three 
chromatographic steps: Fast Flow SP Sepharose; FPLC Mono-S, 
and Sepharose S200. The final preparation of HCV BK 1-189 mi- 
grated as a single entity on SDS-pofy aery (ami de gel electrophoresis 
in reducing and no n reducing buffer system, with an apparent molec- 
ular mass of 20,000 kDa. The specific activity of the pure sample 
of HCV BK 1-189 was about 50 nmol/min/mg using a 5A/5B 15 
residue synthetic peptide substrate (C. Lewis, unpublished data). 
Automated Edman degradation gave an N-terminal sequence Pl- 
TAYS; thus, the initiation methionine and alanine residues from the 
original sequence are missing. Details of protein purification and 
enzyme characterization will be published elsewhere. 

Protein Crystallization and X-Ray Data Collection 

Crystals were grown at 4*C by the hanging- drop vapor diffusion 

method, using plastic Linbro tissue culture plates. ATiquots (5 nl) of 



10 mg/ml of protein HCV BK 1-189 in 50 mM sodium acetate, 10 
mM dithrothreitol, 350 mM NaCI (pH 6.0) were mixed with 5 \i\ of 
reservoir containing 5% PEG 400, 3.5 M NaCI, 1 50 mM Tris-HCI (pH 
8.5). Crystals appeared after 2-3 weeks, and they grew as hexagonal 
rods with typical dimensions 0.1 x 0.1 x 0.6 mm. 

The crystals belong to space group R32 with hexagonal cell pa- 
rameters of a = b = 1 33 A, c = 223 A. There are three molecules 
per asymmetric, unit, which gives a calculated solvent content of 
62%. The extreme sensitivity of heavy atom-soaked crystals to 
X-rays, plus the eventual use of synchrotron radiation, prompted a 
uniform application of cryo-cooling techniques (-170X). In-house 
data were obtained from a 30 cm MAR imaging plate and processed 
with DENZO and SCALEPACK (Otwinowski and Minor, 1993), and 
complete data to 3.5 A (derivatives) or 3.0 A {native) were routinely 
obtained. Synchrotron sources at Photon Factory, Japan, and Euro- 
pean Synchrotron Radiation Facility, France, were ultimately used 
to obtain higher resolution native and derivative data (see Table 1). 

Crystallographic Phasing Strategies 

During heavy-atom screening in house, most trials led to isomor- 
phous difference Patterson maps containing peaks at similar posi- 
tions; these peaks were also seen in several mercury and gold 
anomalous difference Pattersons. Compounds with a wide variety 
of binding mechanisms led to the same peaks. Nonisomorphism 
was a significant problem, and data from short soaks frequently 
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Table 1 . Crystallographic Data Statistics 



Parameters 



Heavy atom (mM) 
Soaking time (hrs) 
Synchrotron facility" 
Data collection device* 
Wavelength (A) 
Maximum resolution (A) 
Data completeness (%) 
l/a I at resolution limit 
Observations (No.) 
Unique reflections 
Rsym d (%) 
Riso" (%) 



n oifa 

R«*» g anomalous 
FOM h 

solvent flattened FOM 
rms f/Eiso 
rms f'VEano 
Sites (No.) 



HgCI/Native (1) 
Diffraction data 



BMDA* 



Native (2) 



0.05 
25.0 
Photon Factory 
Fuji ip 

1.0000 

2.7 
93.8 

3.8 
93,997 
19,945 

7.5 

SIRAS statistics to 2.7 A 



6.0 
41.0 
Photon Factory 
Fuji ip 

1 .0064 
2.5 
93.9 
4.4 
135,383 
24,998 
7.7 
16.5 



0.56 
0.32 
0.69 
0.87 
2.34 
2.86 
9 



ESRF 
Mar ip 

0.995 

2.4 
97.8 

3.9 
96,285 
29,437 

7.9 



* 3,6-bis(mercurimethyl) dioxane acetate. 

b Beamline 18B, Photon Factory Tsukuba City, Japan; Beamline 4, ESRF European synchrotron radiation facility in Grenoble, France. 

c Fuji ip: data measured using a We is sen berg camera in oscillation mode, Mar ip: Mar research image plate system. Data from both systems 

were processed with DENZO and SCALE PACK (Otwinowski and Minor, 1996). 

d is the unweighted R factor on I between symmetry mates. 

' Rb» = S(lF ( jJ-lF n «jl)/XlF r »tl where F^ and F^ are derivative and native structure factor amplitudes, respectively. 

\K*u ~ SlIF,^ +/- FndU-Fmerfcjtl/^lFd* +/- F^l for all centric reflections. 

g Ra** anomalous was calculated for the top 25 percent largest Bijvoet differences. 

h FOM is the figure of merit. 



had to serve as "native" relative to longer soaks as "derivative." A 
total of three major sites per asymmetric unit was eventually de- 
duced from Patterson maps using HASSP (Terwilliger et al., 1987). 
Later, it was found that each site lay near a cluster of cysteines that 
formed a structural zinc site and that this region was disrupted by 
heavy-atom binding. The best in- house phases were obtained by a 
combination of isomorphousand anomalous data from HgCfe soaks. 
Heavy-atom site refinement using PHASES (Furey and Swamina- 
than, 1990) at 5 A gave an overall figure of merit, phasing power 
isomorphous/anomaJous, and centric R factor of 0.75, 2.5/3.0, and 
0.45, respectively. This was used to initiate solvent flattening (Wang, 
1 985) and to check the handedness of the sites. Phasing information 
beyond 5 A was poor, but the 5 A electron density maps showed 
well-defined and separated molecular envelopes, indicating three 
molecules per asymmetric unit. The heavy-atom positions were 
used to determine noncrystallographic operators, so that 3-fold av- 
eraging combined with phase extension to 3.5 A could be initiated 
using PHASES. While such maps could not be used for tracing, they 
did show features that suggested a trypsin-like structure, namely 
two globular domains, one of which resembled a thick-walled barrel. 
This permitted a positioning of our NS3P homology model into the 
unit cell. This model was based on rhinovims 3C proteinase and 
generated with LOOK (Molecular Applications Group, Stanford, CA). 
When higher resolution maps became available (see below), it was 
found that this model had been correctly positioned and could serve 
as a guide for building the structure. 

, Data collected at Photon Factory from two mercury-soaked crys- 
tals, using 3,6-bis(mercurimethyf) dioxane (BMDA) and HgClj, pro- 
vided the higher resolution phases necessary to produce traceable 
maps. The isomorphous signal from differences between BMDA and 
HgClj (HgCt 2 being more native-like), combined with the anomalous 
signal from the BMDA data, led to the best results (Table 1). For 
BMDA, a total of six more sites per asymmetric unit was found by 
difference Fourier methods; these were spatially close to the first 
set of three. Phasing statistics indicated usable experimental phases 



to 2.7 A. Density maps calculated to 3.0 A and 2.7 A revealed a 
nearly complete trace for the protein, with abundant side-chain 
information, and confirmed that NS3P folds with a double— p barrel 
trypsin-like motif. 

Model Building and Refinement 

Based on 3 A and 2.7 A solvent-flattened electron density maps, a 
model was built for one molecule of the asymmetric unit into the 
region with best connectivity, using FRODO (Jones, 1 978) and XFIT 
(McRee, 1 992). This model was then copied to the other two posi- 
tions via the previously refined noncrystallographic symmetry (NCS) 
operators and adjusted locally where necessary. At this point, each 
monomer consisted of a trypsin-like core with two disconnected 
parallel elongated strands lying upon one surface, which repre- 
sented in some manner the N-terminus (approximately 29 residues). 
Side-chain density in these strands clearly defined a unique se- 
quence fit and direction, which was inconsistent with the strands 
belonging to the same (nearest) molecule. For example, density for 
an approximately eight-aminoacid B strand ending close to residue 
30 (the start of the first barrel) could not be fit with the sequence 
21 -29 as expected; instead, it was fit clearly by 1 4-21 (most notably 
at Cys-16) and running in the opposite direction. This situation, 
along with nearly continuous density extending between molecules 
(before 14 and after 21), led to the N-terminal strand-exchange 
structure of NS3P. 

Initial crystallographic refinement with XPLOR (Brunger, 1992b) 

* 

and manual readjustment of the model were done at 2.7 A using 
the HgCt data (this being more native-like), with NCS restraints 
applied to the three monomers of the trimer. Later, native data to 
2.4 A resolution from European Synchrotron Radiation Facility was 
used, and the three molecules were refined with only weak NCS 
restraints, the weight chosen to optimize the free R value (Brunger, 
1 992a). XPLOR treatment included simulated annealing ("slow-cool- 
ing" from progressively lower temperatures as the model improved), 
positional refinement, and then restrained individual temperature 
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factor refinement Rebuilding was done from anneal ed-omit maps 
and (2Fo-Fc) maps, with continual reference to the high quality 
experimental maps. With the European Synchrotron Radiation Facil- 
ity native data, it was possible to construct a zinc site, based on 6 
a density in (Fo-Fc) maps, between two neighboring surface loops 
where cysteines 97, 99, and 145 were known to be located. The 
zinc site was eventually refined with only nonbonded parameters (no 
explicit chelation bonds) and remained stable as three sulfhydryls 
surrounding the zinc, with sulfur-zinc distances of 2.0-2.5 A. Verifi- 
cation of the N -terminal trace and strand exchange was made for 
two molecules out of three in the asymmetric unit {our monomers 
"1 " and "3"). For monomer "1 the only disordered residues are the 
last five {out of 188); this is the molecule chosen for the Discussion. 
In the other two molecules, density beyond residue 177 is poorly 
defined. Using only residues in the £ barrel motifs of each molecule, 
the root-mean-square diffence in Ca positions between monomers 
ranges from 0.5 A for monomers 1-3 to 0.7 A for monomers 1-2. 

A total of 125 water molecules and seven chloride ions were 
eventually included during the refinement process of the trimer (5058 
protein atoms). The chlorides were chosen to replace those waters 
whose B factors had refined very low and which had locations near 
electrostatically unpaired basic side chains. The R factor is 22.5% 
over 8-2.4 A (European Synchrotron Radiation Facility data; 25,000 
reflections with I > 2.0 cr[fD, and the free R factor is 32% (2,000 
reflections). Bond and angle deviations are 0.02 A and 2.1 °, respec- 
tively, as determined by XPLOR using the Engh and Huber (1 991 ) 
parameters. PROCHECK (Laskowski et a)., 1993) analysis of <pfy 
angles indicates no residues in disallowed regions and two in "gen- 
erously allowed" regions. The average temperature factor is 1 7 A 2 
for main-chain atoms and 20 A 2 for side-chain atoms. 

Acknowledgments 

Correspondence should be addressed to Z. Hostomska. We wish 
to thank the following individuals: Drs. D. Knighton and C. Kissinger 
for help with data collection and analysis; M. Stempniak, B. Nodes, 
N. Tajima, S. Rahmati, B. Aust,and G. Hudson for excellent technical 
assistance; Drs. S. Reich, E. Villafranca, P. Dragovich, and C. Lewis 
for valuable discussions; Dr. D. Matthews for critical reading of the 
manuscript; and Drs. H. Tsuge, M. Miyano, H. Ago, N. Watanabe, 
M. Suzuki, N. Sakabe, and E Inagaki for their assistance with data 
collection at Photon Factory. This study was in part supported by 
the Sakabe project of Tsukuba Advanced Research Alliance. 

Received September 13, 1996; revised September 24, 1996. 
References 

Allaire, M., Chemala, M.M., Malcolm, B.A., and James, M.N.G. 
(1 994). Picomaviral 3C cysteine proteinases have a fold similar to 
chymotrypsin-like serine proteinases. Nature 369, 72-76. 

Bartenschlager, R., Ahlbom-Laake, L, Mous, J., and Jacobsen, H. 
(1 994). Kinetic and structural analyses of hepatitis C virus polyprot- 
ein processing. J. Virol. 68, 5045-5055. 

Bartenschlager, R., Ahlbom-Laake, L, Yasargil, K., Mous, J., and 
Jacobsen, H. (1995). Substrate determinants for cleavage in cis and 
in trans by the hepatitis C virus NS3 proteinase. J. Virol. 69, 198-205. 

Bazan, J.F., and Fletterick, R.J. (1988). Viral cysteine proteases are 
homologous to the trypsin-like family of serine proteases: structural 
and functional implications. Proc. Natl. Acad. Sci. USA 85, 7872- 
7876. 

Brunger, A. (1992a). Free R-value: a novel statistical quantity for 
assessing the accuracy of crystal structures. Nature 355, 472-475. 

Brunger, A. (1 992b). XPLOR Version 3.1 : A System for X-Ray Crystal- 
lography and NMR. (New Haven, Connecticut: Yale University). 

Burley, S.K., and Petsko, G.A. (1986). Amino-aromatic interactions 
in proteins. FEBS Lett. 203, 139-143. 

Carson, M. (1991). Ribbons 2.0. J. Appl. Cryst. 24, 958-961. 

Choi, H.-K., Tong, L, Minor, W., Dumas, P., Boege, U„ Ross man, 
M.G., and Wengier, G. (1 991). Structure of Sindbrs virus coat protein 



reveals a chymotrypsin-like serine proteinase and the organization 
of the virion. Nature 354, 37-43. 

Choo, Q.L, Kuo, G., Weiner, A. J., Overby, L.R., Bradley, D.W., and 
Houghton, M. (1 989). Isolation of a cDNA clone derived from blood- 
borne non-A, non-B viral hepatitis. Science 244, 359-362. 

Choo, Q.L, Richman, K.H., Han, J.H., Berger, K., Lee, C, Dong, C, 
Gallegos, C, Cort, D., Medina-Selby, R., Barr, P.J., Weiner, A. J., 
Bradley, D.W., Kuo, G., and Houghton, M. (1991). Genetic organiza- 
tion and diversity of the hepatitis C virus. Proc. Natl. Acad. Sci. USA 
88, 2451-2455. 

Connolly, M.L. (1 983). Solvent-accessible surfaces of proteins and 
nucleic acids. Science 227, 709-713. 

D'Souza, E.DA, Grace, K, Sangar, D.V., Rowlands, D.J., and Clarke, 
B.E. (1995). in vitro cleavage of hepatitis C virus polyprotein sub- 
strates by purified recombinant NS3 protease. J. Gen. Virol. 76, 
1729-1736. 

Engh, R.A., and Huber, R. (1991). Accurate bond and angle parame- 
ters for X-ray protein structure refinement. Acta Cryst. A47, 392-400. 

Failla, C, Tomei, L., and De Francesco, R. (1994). Both NS3 and 
NS4A are required for proteolytic processing of hepatitis C virus 
nonstructural proteins. J. Virol. 68, 3753-3760. 

Fujinaga, M., Delbaere, T.J., Brayer, G.D., and James, M.N.G. (1985). 
Refined structure of a-tytic protease at 1 .7 A resolution: analysis of 
hydrogen bonding and solvent structure. J. Mol. Biol. 783, 479-502. 

Furey, W., and Swaminathan, S. (1990). "PHASES"— a program 
package for the processing and analysis of diffraction data from 
macromolecules. ACA Meeting Summaries 73 (New York: American 
Crystallographic Association). 

Grakoui, A., McCourt, D.W., Wychowski, C, Feinstone, S.M., and 
Rice, CM. (1 993). Characterization of the hepatitis C vims-encoded 
serine proteinase: determination of protein ase-dependent polyprot- 
ein cleavage sites. J. Virol. 67, 2832-2843. 

Han, D.S., Hahm, B., Rho, H.-M., and Jang, S.K. (1995). Identification 
of the protease domain in NS3 of hepatitis C virus. J. Gen. Virol. 76, 
985-993. 

Hijikata, M., Mizushima, H., Akagi, T., Mori, S M Kakiuchi, N., Kato, 
N., Tanaka, T., Kirnura, K,, and Shimotohno, K (1993). Two distinct 
proteinase activities required for the processing of a putative non- 
structural precursor protein of hepatitis C virus. J. Virol. 67, 4665- 
4675. 

Houghton, M. (1 996). Hepatitis C viruses. In Fields Virology, Third 
Edition, B.N. Fields, D.M. Knipe, and P.M. Howtey, eds. (New York: 
Raven Press), pp. 1035-1058. 

Ji, X., Zhang, P., Armstrong, R.N., and Gilliland, G.L (1992). The 
three-dimensional structure of a glutathione S-transferase from the 
Mu class: structural analysis of the binary complex of isoenzyme 3-3 
and glutathione at 2.2 A resolution. Biochemistry 37, 10169-10184. 

Jones, T.A. (1 978). A graphics model building and refinement system 
for macromolecules. J. Appl. Cryst. 77, 268-272. 

Kato, N., Hijikata, M., Ootsuyama, Y., Nakagawa, M., Ohkoshi, S., 
Sugimura, T., and Shimotohno, K. (1990). Molecular cloning of the 
human hepatitis C virus genome from Japanese patients with non- 
A, non-B hepatitis. Proc. Natl. Acad. Sci. USA 87, 9524-9528. 

Koch, J.O., Lohman, V., Herian, U., and Bartenschlager, R. (1996). 
in vitro studies on the activation of the hepatitis C virus NS3 protein- 
ase by the NS4A cof actor. Virology 227, 54-66. 

Laskowski, R.J., MacArthur, M.W., Moss, D.S., and Thornton, J.M. 
(1 993). PROCHECK: a program to check the stereochemical quality 
of protein structures. J Appl. Cryst. 26, 283-291 . 

Levitt, M., and Penjtz, M.F. (1988). Aromatic rings as hydrogen bond 
acceptors. J. Mol. Biol. 207, 751-754. 

Lin, C. and Rice, C.R. (1 995). The hepatitis C virus NS3 serine protein- 
ase and NS4A cof actor: establishment of a cell-free trans -pro- 
cessing assay. Proc. Natl. Acad. Sci. USA 92, 7622-7626. 

Lin, C, Pragai, B.M., Grakoui, A., Xu, J M and Rice, CM. (1994). 
Hepatitis C virus NS3 serine proteinase: trans -cleavage require- 
ments and processing kinetics. J. Virol. 68, 8147-8157. 

Unnen, J., Wages, J., Jr., Zhang-Keck, Z.-Y., Fry, K.E., Krawczynski, 



Cell 
342 



K.Z., After, H., Koonin, E., Gallagher, M., Alter, M., Hadziyannis, S. ( 
Karayiannis, P., Fung, K., Nakatsuji, Y., Shih, J.W.-K., Young, L, 
Piatak, M., Jr., Hoover, C, Fernandez, J., Chen, S., Zou, J.-C., Morris, 
T., Hyams, K.C., Ismay, S., Lifson, J.D., Hess, G., Foung, S.K.H., 
Thomas, H., Bradley, D.,Margolis, H., and Kim, J.P. (1996). Molecular 
cloning and disease association of hepatitis G virus: a transfusion- 
transmissible agent Science 271 , 505-508. 

Liu, S M Ji, X., GHIiland, G.L., Stevens, W.J., and Armstrong, R.N. 

(1 993) . Second-sphere electrostatic effects in the active site of gluta- 
thione S-transf erase: observation of an on-face hydrogen bond be- 
tween the side chain of threonine 13 and the tt -cloud of tyrosine 6 
and its influence on catalysis. J. Am. Chem. Soc. 775, 7910-7911. 

Matthews, DA, Smith, W.W., Ferre, R.A., Condon, B., Budahazi, G., 
Sisson, W. t Villafranca, J.E., Janson, C.A., McElroy, H.E., Gribskov, 
C.L, and Worland, S. (1 994). Structure of human rhino virus 3C prote- 
ase reveals a trypsin-like polypeptide fold, RNA-btndtng site, and 
means for cleaving precursor polyprotein. Cell 77, 761-771. 

McRee, D.E. (1 992). XtalView: a visual protein crystallographic sys- 
tem for X11/Xview. J. Mol. Graph. 10, 44-47. 

Meyer, E., Cole, G., Radhakrishnan, R., and Pepp, O. (1988). Struc- 
ture of native porcine pancreatic elastase at 1 .65 A resolution. Acta 
Cryst. 644, 26-38. 

Miller, R.H., and Purcell, R.H. (1 990). Hepatitis C virus shares amino 
acid sequence similarity with pesti viruses and flavi viruses as well 
as members of two plant virus supergroups. Proc. Natl. Acad. Sci. 
USA 87, 2057-2061 . 

Otwinowski, Z., and Minor, W. (1996). Processing of X-ray diffraction 
data collected in oscillation mode. Meth. Enzymol. 276, 307-326. 

Pizzi, E., Tramontano, A., Tomei, L., La Monica, N., Failla, C, Sar- 
dana, M., Wood, T., and De Francesco, R. (1994). Molecular model 
of the specificity pocket of the hepatitis C virus protease: implica- 
tions for substrate recognition. Proc. Natl. Acad. Sci. USA 97, 
888-692. 

Read, R.J., and James, M.N.G. (1 986). Introduction to the protein 
inhibitors: X-ray crystallography. In Proteinase Inhibitors, A.J. Barret 
and G. SaJvesen, eds. (Amsterdam; New York; Oxford: Elsevier Sci- 
ence Publishers BV), pp. 301-336. 

Schwabe, J.W.R., and Klug, A. (1994). Zinc mining for protein do- 
mains. Struct. Biol. 7, 345-349. 

Simons, J.N., Leary, T.P., Dawson, G.J., Pilot-Matias, T.J., Muerhoff , 
A.S., Schlauder, G.G., Desai, S.M., and Mushahwar, I.K. (1995). Isola- 
tion of novel virus-like sequences associated with human hepatitis. 
Nature Med. 1, 564-569. 

Simmonds, P., Smith, D.B., McOmish, F., Yap, P.L., Kolberg, J., 
Urdea, M.S., and Holmes, E.C. (1 994). Identification of genotypes 
of hepatitis C virus by sequence comparisons in the core, E1 and 
NS5 regions. J. Gen. Virol. 75, 1053-1061. 

Sommerg ruber, W., Casari, G., Fessl, F., Seipelt, J., and Skem, T. 

(1994) . The 2A proteinase of human rhino virus is a zinc containing 
enzyme. Virology 204, 815-818. 

Stemkuhler, C, Tomei, t_, and De Francesco, R. (1996). tn vitro 
activity of hepatitis C virus protease NS3 purified from recombinant 
baculovirus-infected SF9 cells. J. Biol. Chem. 271, 6367-6373. 

Suzich, J.A., Tamura, J.K., Palmer-Hill, F., Warrener, P., Grakoui, A., 
Rice, CM., Feinstone, S.M., and Collett, M.S. (1993). Hepatitis C 
virus NS3 protein porynucleott de-stimulated nucleosidetriphospha- 
tase and comparison with related pesti virus and flavivirus enzymes. 
J. Virol. 67, 6152-6158. 

Takamizawa, A., Mori, C, Fuke, I., Manabe, S., Murakami, S., Fujita, 
J„ Onishi, E., Andoh, T., Yoshida, I., and Okayama, H. (1991). Struc- 
ture and organization of the hepatitis C virus genome isolated from 
human earners. J. ViroL 65, 1 1 05-1 113. 

Terwilliger, T.C., Kim, S.-H., and Bsenberg, D. (1 987). Generalized 
method of determining heavy-atom positions using the difference 
Patterson function. Acta Cryst. A43, 1-5. 

Vallee, B.L, and Auld, D.S. (1 990). Zinc coordination, function, and 
structure of zinc enzymes and other proteins. Biochemistry 29, 
5647-5659. 



van Doom, LJ. (1 994). Review: molecular biology of the hepatitis 
C virus. J. Med. Virol. 43, 345-356. 

Voss, T., Meyer, R., and Sommergruber, W. (1995). Spectroscopic 
characterization of rttinoviral protease 2A: Zn is essential for the 
structural integrity. Protein Sci. 4, 2526-2531 . 

Wang, B.C. (1 985). Resolution of phase ambiguity in macromolecular 
crystallography. Meth. Enzymol. 1 15, 90-1 1 2. 

Yu, S.F., and Uoyd, R.E. (1992). Characterization of the roles of 
conserved cysteine and histidine residues in poliovirus 2A protease. 
Virology 786, 725-735. 



The EMBO Journal vol. 9 no. 8 pp.2 . .2638, 1990 



Cleavage-site preferences of Sindbis virus polyproteins 
containing the non-structural proteinase. Evidence for 
temporal regulation of polyprotein processing in vivo 
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The non-structural proteins of Sindbis virus, nsPl, 2, 3 
and 4, are produced upon cleavage of polyproteins P123 
and PI 234 by a proteinase residing in nsP2. We used cell 
free translation of SP6 transcripts to study the proteolytic 
activity of nsP2 and of nsP2<ontaining polyproteins. To 
generate polyprotein enzymes, a set of plasmids was made 
in which cleavage sites were eliminated and new initiation 
and termination codons introduced by in vitro muta- 
genesis. As a substrate, we used a polyprotein in which 
the nsP2 proteinase had been inactivated by a single 
amino acid substitution. All nsP2-containing polyproteins 
cleaved the nsPl/2 site in trans. However, proteinases 
containing nsPl were unable to cleave the nsP2/3 site. 
Furthermore, only proteinases containing nsP3 could 
cleave the nsP3/4 site. These differences in cleavage site 
specificity result in a temporal regulation of processing 
in vivo. At 1.7 h post infection P123 and nsP4 accu- 
mulated and only small amounts of P34 were found. 
However, at 4 h post infection PI 23 was processed 
rapidly and P34 was produced rather than nsP4. Since 
nsP4 is thought to be the viral RNA polymerase, the 
temporal regulation of the nsP4/P34 ratio may be 
responsible for the temporal regulation of RNA synthesis. 
Key words: alphavirus/non-structural proteins/poly protein/ 
proteinase 



Introduction 

Sindbis virus (SIN) is an enveloped, plus-stranded RNA virus 
belonging to the genus alphavirus of the family Togaviridae. 
The 1 1 .7 kb genome, which is capped and polyadenylated, 
acts as a messenger for the production of four non-structural 
proteins (nsPl, nsP2, nsP3 and nsP4, ordered from the 
NH 2 -terminus), which are thought to form the viral 
transcriptase — replicase complex. During replication, the 
genomic RNA is transcribed into full-length minus-strand 
RNA, which in turn serves as a template for the synthesis 
of new.genomic RNA and 26S subgenomic RNA. The latter 
RNA species is the messenger for the structural proteins 
(Strauss and Strauss, 1986). The synthesis of minus-strand 
RNA ceases 3—4 h post infection, while the production of 
genomic and subgenomic RNA continues throughout the 
infectious cycle (Sawicki and Sawicki, 1980; Sawicki et al. , 



1981a,b). The mechanism by which minus-strand synthesis 
is switched off is unknown. 

The non-structural proteins are made as two polyprotein 
precursors (Strauss et al., 1983, 1984). For both precursors 
translation starts at the same initiation codon. In most 
instances translation stops at an opal termination codon 
downstream of the nsP3 gene, resulting in polyprotein PI 23 
(200 kd). However, at a rather high frequency, read-through 
of the termination codon occurs leading to the synthesis of 
a larger polyprotein (250 kd) which includes nsP4 (P1234). 

Although the functions of the non-structural proteins have 
not been fully elucidated, the characterization of temperature 
sensitive mutants and protein sequence comparisons have 
provided important clues. nsP4 is hypothesized to be the 
actual RNA polymerase (Kamer and Argos, 1984; Hahn 
et a/., 1989a). nsPl is involved in minus-strand synthesis 
(Hahn et al. y 1989b) and also contains a methyl-transferase 
activity needed for the capping of viral RNAs (Mi et a/., 
1989). nsP2 is required for the synthesis of the 26S 
subgenomic RNA and for the shut-off of minus-strand 
synthesis (Hahn et a/., 1989b). Interestingly, nsP2 also 
contains the proteinase responsible for the processing of the 
non-structural polyproteins (Ding and Schlesinger, 1989; 
Hahn et a!., 1989b; Hardy and Strauss, 1989), and thus in 
principle could regulate the synthesis of minus-strand and 
subgenomic RNA through proteolytic cleavages. The 
proteolytic activity was localized to the carboxy-terminal half 
of nsP2. This domain shows a limited sequence similarity 
to the papain-like thiol proteinases, implicating Cys 481 as 
the active site residue (Hardy and Strauss, 1989). 

Processing of P123 in vitro was shown to be sensitive to 
dilution, indicating that cleavage of this protein occurs 
predominantly in a bimolecular reaction (in trans); in 
contrast, processing of PI 2 was not influenced by dilution, 
strongly suggesting autoproteolysis (cleavage in cis) (Hardy 
and Strauss, 1989). In vivo, the kinetics of processing 
indicated that at 3 - 4 h after infection, PI 23 is first cleaved 
to P12 and nsP3, followed by cleavage of PI 2 (Hardy and 
Strauss, 1988). Paradoxically, however, the elimination of 
the nsP2/3 cleavage site by site specific mutagenesis did not 
influence the cleavage at the nsPl/2 and nsP3/4 sites, as 
studied in vitro , but the elimination of the nsPl/2 site 
prevented the cleavage at the nsP2/3 site and resulted in the 
accumulation of P123, i.e. cleavage of the nsPl/2 site 
appears to be essential for initiation of the processing 
pathway (Shirako and Strauss, 1990). These conflicting 
observations can be rationalized by postulating that (i) there 
are proteolytically active polyproteins containing the nsP2 
region; (ii) the different polyprotein proteinases differ in their 
preferences for the three cleavage sites; and (iii) at 3—4 h 
after infection proteinases with a preference for the nsP2/3 
site predominate. In this paper we show by cell free transla- 
tion of synthetic transcripts that the different polyproteins 
are indeed active proteinases that differ in their cleavage site 
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preferences. Furthermore, we provide evidence that these 
differences result in vivo in a temporal regulation of 
poly protein processing. The possible consequences for 
regulation of viral replication are discussed. 

Results 

Construction of cDNA clones for expression of 
nsP2-containing polyproteins 

In vitro transcription of cDNA clones, followed by in vitro 
translation, provides a powerful approach to the study of 
the processing of viral polyproteins (Ympa-Wong and 
Semler, 1987). We have employed this strategy to deter- 
mine the fra/ty-cleavage activities of SIN nsP2 and of 
polyproteins containing nsP2. For this purpose we 
constructed a set of cDNA clones that when transcribed and 
translated in vitro would give rise to nsP2-containing poly- 
proteins that could be used either as enzymes or substrate 
in frans-cleavage assays. Construction of these clones 
required creation of new initiation or termination codons, 
mutagenesis of cleavage sites to render them non-cleavable, 
mutagenesis of the enzymatic domain to inactivate the 
proteinase, and replacement of the opal codon following nsP3 
with a serine codon. This set of clones and the terminology 
used is illustrated schematically in Figure I A and described 
in more detail below. 

Since nsP2 is formed by proteolytic cleavage of a polypro- 
tein, the nsP2 gene does not have an initiation or termina- 
tion codon. In plasmid pToto.2, we have provided the nsP2 
gene with an initiation codon preceding the Alal codon, and 
an amber stop codon immediately downstream of the 3' 
terminal Ala807 codon. The gene is located downstream of 
an SP6 RNA polymerase promoter and the 5'-terminal non- 
coding 60 nucleotides of the SIN genome, such that transcrip- 
tion with SP6 polymerase leads to an RNA with the authentic 
SIN leader immediately coupled to the nsP2 gene. In vitro 
translation of such transcripts resulted in the synthesis of an 
80 kd polypeptide, which in SDS- PAGE gels comigrated 
with the authentic SIN nsP2 protein (Figure IB, lane 3). To 
distinguish this protein product from the nsP2 derived by 
normal proteolytic processing of polyproteins, we will refer 
to it as N2. 

To produce non-cleavable polyproteins containing nsP2, 
we took advantage of observations that the 1/2, 2/3 and 3/4 
cleavage sites can be eliminated by changing them from Gly- 
Ala-Ala, Gly-Ala-Ala and Gly-Gly-Tyr, respectively, to Glu- 
Ala-Ala, Glu-Ala-Ala and Gly-Val-Tyr, respectively 
(Shirako and Strauss, 1990; R.J.de Groot, unpublished 
results). The series of cDNA plasmids illustrated in Figure 
1 A when transcribed and translated in vitro yield all possible 
nsP2 -containing polyproteins. In vitro translations of SP6 
transcripts derived from these plasmids are shown in Figure 
IB. To distinguish these non-cleavable polyproteins from 
those produced by translation of wild-type RNA, we will 
refer to them with an N rather than a P. Thus, for example, 
N123 refers to the polyprotein containing the sequences of 
non-structural proteins 1 , 2 and 3 in which the cleavage sites 
have been eliminated by mutagenesis, whereas PI 23 is the 
polyprotein translated from wild-type RNA. 

To study the /ra^-cleavage activities of these 
nsP2 -containing polyproteins, a substrate was required. In 
previous experiments, we have used cDNA clones containing 
deletions in the protease domain of nsP2 gene to produce 
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Fig. 1. (A) Schematic representation of the constructs used for 
expression studies. The nomenclature used for the plasmids and for the 
expression products obtained are indicated to the left and right, 
respectively. In the diagrams the non-structural protein coding regions 
are depicted as boxes with different shading used for each protein; the 
cleavage sites, either unaltered or mutagenized, are shown in the one 
letter amino acid code. Initiation codons are indicated by open 
triangles. The white diamond indicates the opal codon read through 
with high efficiency, which is replaced by a serine codon in the 
constructs pToto.1234, pToto.234 and pToto.SI234 as indicated; the 
solid diamonds depict termination codons resulting in efficient 
termination. The asterisk indicates the Cys481 to Gly substitution. (B) 
In vitro translation of SP6 transcripts derived from constructs shown in 
A. Rabbit reticulocyte lysate was supplemented with l 35 Slmethionine 
and RNA and incubated for 1 h at 30°. Translates were analyzed by 
7.5% SDS -PAGE. A translate of SIN strain HR RNA served as a 
marker {lane 1). The positions of the SIN non-structural proteins and 
their precursors are indicated. 
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truncated polyproteins that are unable to process by 
themselves (Hardy and Strauss, 1989). However, these dele- 
tions may affect the overall conformation of the polyprotein 
and thereby influence the results of a rrartj-cleavage assay. 
Recently, we have found that by changing Cys 481 into Gly 
the proteolytic activity of nsP2 is completely abolished 
(R. Levinson, R.J. de Groot and J.H. Strauss, unpublished 
results). A plasm id was constructed in which the Cys to Gly 
substitution in nsP2 was combined with the opal to Ser 
substitution at the C-terminus of nsP3 (position 1 897 of the 
SIN non- structural open reading frame; Li and Rice, 1989) 
(Figure 1). Transcription of this plasmid, pToto.S1234, 
followed by translation in vitro, resulted in the synthesis of 
the uncleaved 250 kd precursor (Figures 2 and 3). This 
product, designated SI 234, was considered an ideal 
substrate, since the single amino acid Cys to Gly substitu- 
tion is less likely to change the folding of the polyprotein 
than deletions in the nsP2 gene. 

Trans-cleavage assays 

For /ra/u-cleavage assays, the nsP2-containing polyproteins 
used as enzymes were synthesized by in vitro translation 
using unlabeled methionine, while the substrate S1234 was 
radioactively labeled by performing the translation in the 
presence of [ 35 S] methionine. The translation reactions were 
stopped by adding cycloheximide and excess unlabeled 
methionine, after which the enzymes and substrate were 
incubated together. Translates of wild-type SIN genomic 
RNA were used as positive controls. Incubation of SI 234 
with the wild-type translate resulted in the production of all 
four non-structural proteins as well as of the polyproteins 
P12, P123 and P34 (Figure 2, lane 2). P12 and nsP3 were 
the most abundant products, indicating that cleavage occurred 
predominantly at the 2/3 and 3/4 sites. After incubation of 
S1234 with N123 or N1234 as enzymes, only the products 
PI 23, P23, nsP4 and nsPl were found. Apparently, N123 
and N1234 can cleave only the 1/2 and 3/4 sites and are 
unable to cleave the 2/3 site. Visual inspection of the 
autoradiogram suggested that most SI 234 was converted into 
PI 23, indicating that the 3/4 site was predominantly cleaved 
(Figure 2). To assess the difference in the efficiency of 
cleavage of the 1/2 and 3/4 sites more precisely, we deter- 
mined the amounts of input SI 234 (lane labeled 'Blank') and 
the cleavage products PI 23 and P23 (lane ' + N123') by 
densitometry (P234 was not detected). An endogenous 
reticulocyte protein labeled during translation served as an 
internal control for the amount applied to each lane. N123 
cleaved 90—95% of the substrate at the 3/4 site, whereas 
only 35— 38% of the 1/2 sites were cleaved. nsP4 was under- 
represented in Figure 2. The amounts of nsP4 found in 
translates varied between experiments for unknown reasons. 
Loss of nsP4 was not always observed and appeared to 
depend on storage conditions and the batch of reticulocyte 
Iysate used (not shown). 

Upon incubation of SI 234 with the enzymes N23 and 
N234, all four non-structural proteins were found, in addi- 
tion to the polyproteins P123, P12, P23, P34 and P234 
(Figure 2). Thus these enzymes are able to cleave all three 
sites. In contrast, processing of S1234 by N12 yielded only 
low amounts of nsPl and P234 (Figure 3). Apparently, 
cleavage at the 2/3 and 3/4 sites did not occur, and enzyme 
N12 is able to cleave only the 1/2 site. 

A translation mixture containing nsPl, nsP2 and P12 
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Fig. 2. Trans-cleavage assay: processing of the S1234 substrate 
by N123, N1234, N23 and N234 in vitro. 35 S-Iabe1ed S1234 was 
prepared as described in Figure 1. The polyproteins used as enzymes 
were synthesized similarly except that unlabeled methionine was used. 
Translation reactions were stopped by adding cycloheximide and excess 
unlabeled methionine after which the enzymes and substrate were 
mixed at a 1:1 (v/v) ratio and allowed to incubate for an additional 
2 h at 30°. The cleavage products were analyzed by 7.5% 
SDS-PAGE. Translates without added RNA (Blank) and with SIN 
strain HR viral RNA (wt-mix) served as negative and positive 
controls, respectively. The right hand panel shows a longer exposure 
of the bottom half of the gel. 




Fig. 3. Traar-cleavage assay: in vitro processing of the S1234 
substrate by N2, N12 and a translate containing P12, nsPl and nsP2 
(derived from pToto. I +2). The experimental procedures were as in 
Figure 2. The right hand panel shows a longer exposure of the bottom 
half of the gel. The arrows point to P34, P12 and nsPl in the N2 
digest of SI 234. 

derived from pToto. 1 +2 (Figure IB, lane 2) cleaved SI 234 
predominantly at the nsP2/3 cleavage site: the major products 
observed were P12 and P34. The 1/2 site was cut less effi- 
ciently, and cleavage of the 3/4 site was not detected (Figure 
3). If we assume that PI 2 has the same cleavage specificity 
as N12, these findings indicate that nsP2 produced by 
cleavage of P 12 is able to cleave the 2/3 site but not the 3/4 
site, with possible activity at the 1/2 site unresolved. Essen- 
tially the same results were obtained upon incubation of 
S1234 with N2 derived from pToto.2, as this enzyme was 
able to cleave the 1/2 and 2/3 sites but not the 3/4 site. 
However, the proteolytic activity of N2 was surprisingly low 
(Figure 3). To rule out the possibility that these results were 
caused by mutations introduced into pToto.2 and pToto. 12 
during cloning, the nsP2 genes in constructs pToto.12 and 
pToto.2 were sequenced, and the sequence in both was iden- 
tical to that in the parental cDNA clone pTotol 101 . In addi- 
tion, the 2029 bp EcoBJ-Pst\ fragments (nucleotides 
1920-3949 of the SIN genome; 83% of the nsP2 gene) of 
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pToto.2 and pToto. 12 were used to replace the corres- 
ponding fragments in pTotollOl. The non-structural 
polyproteins translated from transcripts of the resulting 
plasmids were processed normally (data not shown), and thus 
the protease domain appears to be normal. The reason for 
the poor proteolytic activity of N2 remains unclear. The most 
likely explanation is that nsP2, normally produced as part 
of a polyprotein , requires the flanking protein sequences to 
adopt its correct (i.e. proteolytically active) conformation. 
Alternatively, nsPl could act as a cofactor for optimal 
cleavage by nsP2. 

The results of the tran j-cleavage assays are summarized 
in Table I. The specificities of the various polyproteins can 
be described with the following rules: (i) the presence of 
nsP4 sequences does not affect the specificity of the 
proteinase, i.e. polyproteins with or without nsP4 had the 
same cleavage specificity; (ii) if nsPl is present in the 
polyprotein, the proteinase is unable to cleave the 2/3 site; 
proteinases lacking the nsPl moiety cleave the 2/3 site 
efficiently; (iii) the presence of nsP3 in the proteinase is 
required for cleavage of the 3/4 site. 

Temporal regulation of the [nsP4]/[P34] ratio 
The differences in site-preference of the proteinase 
precursors, observed in vitro, prompted us to consider the 
possibility of post-translationa] regulation of processing in 
vivo. In particular, our data predicted a temporal regulation 
of the nsP4 to P34 ratio. The rationale for this is that very 
early in infection, P123 and PI 234 are expected to be abun- 
dant. Since these enzymes predominantly cleave the 3/4 site 
and do not cleave the 2/3 site, nsP4 would be generated 
rather than P34. However, later in infection, enzymes with 
a preference for the 2/3 site (Table I) will have accumulated 
(Hardy and Strauss, 1988). Rapid cleavage of the 2/3 site 
would not only prevent accumulation of P123 and P1234, 
but would also eliminate the proteinases capable of cleaving 
the 3/4 site (P123, P1234, P23 and P234), thus resulting 
in the accumulation of P34 rather than nsP4. 

To test this hypothesis, primary chicken embryo cells 
infected with SIN strain HR were labeled during a 5 min 
pulse with [ 35 S]methionine early (1 h 45 min) and late (4 h) 
after infection, followed by a chase with excess unlabeled 
methionine (it should be noted that under these conditions 
— 15 min are required to synthesize a complete SIN polypro- 
tein, and thus the first part of the chase period is involved 
with completion of initiated labeled chains; Hardy and 
Strauss, 1988). Cell ly sates were immunoprecipitated with 
a rabbit antiserum specific for nsP4 (Hardy and Strauss, 
1988). Early in infection, PI 234 was predominantly 
processed to produce nsP4, as predicted (Figure 4, early). 
P34 accumulated to some extent during the 5 min chase, 
but then declined, most likely to yield nsP3 and nsP4. The 
amount of PI 234 present could not be determined because 
of the presence of a 250 kd host protein which precipitated 
non-specifically. Conversely, at 4 h after infection P34 
accumulated rather than nsP4 (Figure 4, late). The 250 kd 
host protein was not present, probably due to effective shut- 
ofTof host protein synthesis, allowing visualization of P1234. 
Since P1234 was not present in large amounts, cleavage at 
the 2/3 site occurred either cotranslationally or immediately 
after termination of translation (Figure 4, late). During the 
chase P34 was converted into a form with slightly lower 
mobility, most likely due to phosphorylation (Hardy and 
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Fig. 4. Synthesis of P34 and nsP4 early and late in SIN infection. 
Confluent monolayers of secondary chicken embryo cells were infected 
with SIN strain HR at an m.o.i. of 50 p.f.u./cell. After 1 h incubation 
at 37 °C, the inoculum was removed and replaced by Eagle's medium, 
containing 5 fiM methionine. At I h 45 min p.i. ('early') or 4 h p.i. 
('late*) the viral proteins were labeled with [* 5 S] methionine during a 
5 min pulse and then either lysed immediately or after a chase with 
excess unlabeled methionine for the periods of time indicated (in min). 
The lysates were immunoprecipitated with a non-specific rabbit 
antiserum raised against nsP4. Immunoprecipitates were analyzed by 
7.5% SDS-PAGE. The following controls were included: (a) mock- 
infected cells were lysed after a 5 min pulse labeling and a 0 min or 
30 min chase, and subjected to immunoprecipitation with ansP4 
serum; (b) SIN infected cells were lysed after a 5 min pulse labeling 
and a 0 or 30 min chase and subjected to immunoprecipitation with 
an$P4 pre-immune serum. An in vitro translate of SIN strain HR viraJ 
RNA served as a marker. The positions of the SIN non-structural 
proteins and their precursors, are given. 

Strauss, 1988; Peranen et aL, 1988; GXi, M.W.LaStarza, 
W.R.Hardy, J.H.Strauss and C.M.Rice, in preparation). 

To study the processing of nsP2-containing polyproteins 
at the two different time-points, immunoprecipitations were 
also performed with a rabbit antiserum directed against nsP2 
(Figure 5). Early in infection, PI 23 accumulated during the 
first 15 min of chase but declined thereafter, whereas the 
amount of nsP2 increased steadily during the chase (Figure 
5, early). In contrast, P123 did not accumulate late in infec- 
tion, but appeared to be processed rapidly to generate PI 2 
and nsP2 (Figure 5, late). 

Discussion 

Many RNA viruses express their genetic information by 
synthesis of polyproteins that are post-translationally cleaved 
to generate the functional viral gene products. Host 
proteinases may be involved in the case of structural proteins 
that mature in subcellular organelles, but in the case of 
proteins processed in the cytosol the proteinases responsible 
are encoded by the virus itself (Rice and Strauss, 1981; 
Wellink and van Kammen, 1988). Alphaviruses produce both 
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Fig. 5. Synthesis of nsP2 and nsP2 -containing precursors early and 
late in SIN infection. The samples described in Figure 4 were 
immunoprecipitated with a monospecific rabbit antiserum raised against 
SIN nsP2. Material from an equivalent number of cells was loaded in 
each lane, and the samples were analyzed in the same gel. A 10 day 
exposure of the autoradiogram is shown for the panel labeled 'early' 
and a 1.5 day exposure for the panel labeled 'late'. The following 
controls are included: (a) mock-infected cells, lysed after a 5 min pulse 
labeling and a 30 min chase, subjected to immunoprecipitation with 
ansP2 serum; fb) SIN infected cells, lysed after a 5 min pulse labeling 
and a 30 min chase, subjected to immunoprecipitation with ansP2 pre- 
immune serum. 



their structural and non-structural proteins by processing of 
polyprotein precursors. Whereas the proteolytic cleavage of 
the structural polyprotein has been studied in considerable 
detail (Hahn et aL, 1985; Melancon and Garoff, 1986, 1987; 
Schlesinger and Schlesinger, 1986; Strauss and Strauss, 
1986), only recently have details about the processing of the 
non-structural proteins become known. These studies on the 
non-structural proteins have mostly used SIN, and have been 
aided by the availability of a full-length cDNA clone (Rice 
et aL, 1987) and monospecific antisera raised against each 
of the non-structural proteins (Hardy and Strauss, 1988). It 
has now been firmly established that all three cleavage sites 
in PI 234 are cut by a proteinase residing in the C-terminal 
half of nsP2 (Ding and Schlesinger, 1989; Hardy and 
Strauss, 1989; Shirako and Strauss, 1990). 

Here, we have studied the proteolytic activities of 
nsP2 -containing poly proteins by using in vitro mutagenesis 
techniques and cell free translation of synthetic transcripts. 
Our data show that all nsP2 -containing polyproteins are 
proteolytically active and can cleave the 1/2 site in trans. 
However, the precursors differ strikingly in their cleavage 
site preference with respect to the 2/3 and 3/4 sites. Polypro- 
teins containing nsPl were unable to cleave the 2/3 site in 
vitro, whereas the proteinases lacking nsPl cleaved the 2/3 
site very efficiently. Furthermore, the presence of nsP3 in 
the proteinase appeared to be required for cleavage of the 
3/4 site. An analogous situation exists for poliovirus in that 
the P2 polyprotein is cleaved in trans by the 3C proteinase 
(Ympa-Wong and Semler, 1987) but the PI polyprotein can 
only be cleaved by the 3CD proteinase precursor (Jore et al. , 
1988; Ympa-Wong et aL, 1988). The mechanism by which 
the SIN proteinases change their cleavage site preference 
is unknown. It is possible that after nsPl is removed, the 
proteinase refolds and assumes an altered conformation that 
then enables it to cleave the 2/3 site. Alternatively, the nsPl 
sequences may simply prevent cleavage of the 2/3 site by 
steric hindrance. The sequence of the cleavage site could 
also be important: the 3/4 cleavage site is Gly-Gly-Tyr, while 
the 1/2 and 2/3 sites are both Gly-AJa-Ala. 



Processing of PI 23 in vitro is dilution sensitive, indicating 
that at least the initial cleavage between nsPl and nsP2 occurs 
in a bimolecular reaction (Hardy and Strauss, 1989; Shirako 
and Strauss, 1990). Apparently, it is this dependence on 
/raos-cleavage combined with the fact that the proteinases 
differ in their cleavage site preference that allows the virus to 
regulate post-translationally the synthesis of the non-struc- 
tural proteins and their precursors. Our present view on the 
early events in SIN infection is as follows (Figure 6). After 
its release in the cytoplasm, the genomic RNA is translated 
into PI 23 and PI 234. These proteinases preferentially cut 
the 3/4 site, resulting in the accumulation of PI 23 and nsP4 
(Figure 6A). The 1/2 site is also cut but with lower effi- 
ciency than the 3/4 site (Figure 6B). Cleavage of the 1/2 
site, however, unleashes proteinases with a strong preference 
for the 2/3 site. As these latter proteinases accumulate in 
the cytoplasm, the cleavage of the polyproteins is gradually 
redirected from the 3/4 site to the 2/3 site, generating P34 
and large amounts of nsP3 and P12. The PI 2 precursor 
cleaves in cis to generate free nsPl and nsP2 (Hardy and 
Strauss, 1989) and thereby adds to the proteolytic activity 
directed against the 2/3 site. Finally, at 3— 4 h after infection 
a situation is reached in which nsP2 is present in such high 
concentrations that cleavage of the 2/3 site occurs either 
cotranslationally or immediately after termination of transla- 
tion, thereby simultaneously preventing accumulation of 
P123, eliminating the enzymes that can cleave the 3/4 site, 
and generating P34 rather than nsP4 (Figure 6C). The fact 
that at 3—4 h after infection P34 is abundant and nsP4 is 
present in only low amounts (if at all), had previously led 
to the hypothesis that P34 is the active protein species rather 
than nsP4 (Hardy and Strauss, 1988; Strauss et al., 1988). 
Our present results suggest that both protein species are 
functional, but at different times in infection. 

It seems highly likely that the temporal regulation of the 
non-structural proteins is important for the development of 
the virus life cycle. For one, polyproteins like PI 23 could 
in principle perform functions required early in infection that 
cannot be performed by the final endproducts. But even more 
intriguing is the temporal regulation of the ratio of nsP4 to 
P34. Several observations indicate that nsP4 is the actual 
RNA polymerase. It contains the Gly-Asp-Asp motif found 
in the polymerases of various plant and animal viruses 
(Kamer and Argos, 1984). Moreover, certain ts mutations 
in nsP4 result in a total termination of RNA elongation at 
the non-permissive temperature (Barton et al., 1988; Hahn 
et aL, 1989a). It is therefore tempting to speculate that the 
regulation of the nsP4 to P34 ratio is correlated with a 
temporal regulation of RNA synthesis, e.g. regulation of 
minus-strand synthesis, which normally ceases at 3— 4 h 
after infection (Sawicki and Sawicki, 1980; Sawicki et al., 
I981a t b), of subgenomic RNA synthesis, or both. Since 
nsP4 is predominantly made early in infection, this protein 
could be a polymerase required for minus-strand synthesis, 
while P34 could be required for the synthesis of the 
subgenomic RNA species. In this case, competition between 
nsP4 and P34 for association with replicase complexes could 
provide a plausible explanation for the shut-off of minus- 
strand synthesis at 3-4 h after infection. This hypothesis 
is supported by results obtained for mutant ttl7. At the 
restrictive temperature, this mutant shows aberrant 
processing of the 2/3 site, decreased synthesis of subgenomic 
RNA, and continuous minus-strand synthesis (Hardy et aL, 
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Fig. 6. Cleavage pathways of the SIN non-structural polyproteins: a model for the temporaJ regulation of cleavage. (A) Very early in infection 
cleavage of the 3/4 site is presumed to be the major cleavage pathway resulting in accumulation of PI 23 and nsP4. (B) PI 23 and PI 234 are also cut 
at the 1/2 site early in infection but with lower efficiency. Cleavage of the 1/2 site results in the production of proteinases (nsP2, P23 and P234) that 
can cleave the 2/3 site. The mixture of proteinases eventually leads to the cleavage of all three bonds and thus to the production of nsPl, nsP2, nsP3 
and nsP4. (C) As infection proceeds proteinase active on the 2/3 site accumulates in the cytoplasm and pathway C becomes increasingly important 
until at 4 h p. i. it is predominant: the polyproteins are cut at the 2/3 site cotranslationally or immediately after translation termination, preventing 
accumulation of PI 23. Proteinases that can cleave the 3/4 site (PI 23, PI 234, P23 and P234) are eliminated and P34 is produced rather than nsP4. 



1990; Hahn et «/., 1989b; Sawicki and Sawicki, 1985). In 
accordance with our hypothesis, the ts lesion also results in 
an increased production of nsP4 at the restrictive tempera- 
ture, whereas P34 is turned over or processed rapidly (Hardy 
et al. y 1990). Sawicki and Sawicki (1985) showed that at 
the restrictive temperature, ts 17 -infected cells can resume 
minus-strand synthesis after complete shut-off and this 
resumption occurs even when protein synthesis is inhibited 
by cycloheximide. On the basis of these results it was 
concluded that proteolytic cleavage could not explain the 
shut-off of minus-strand synthesis. However, it seems quite 
possible that after the shift to the non-permissive temperature, 
nsP4 could be generated from pools of P34 already present 
in the cell, i.e. nsP4 could be produced even without de novo 
protein synthesis and lead to the resumption of minus-strand 
synthesis. 

The question remains open as to the function of the opal 
termination codon between nsP3 and nsP4, especially in view 
of the fact that in at least two alphaviruses this codon is 
replaced with a sense codon (Strauss et al., 1988; Takkinen, 
1986). Readthrough of the SIN opal codon occurs at a 
frequency of ~20% in vitro (R.J.de Groot, unpublished 
data). If one assumes that readthrough occurs with similar 
efficiency in vivo, the synthesis of P34 and nsP4 would be 
reduced at least 5-fold as compared with nsPl, nsP2 and 
nsP3. Apparently, this down-regulation is either not 
necessary or accomplished in a different fashion in those 
viruses lacking the termination codon. 

In summary, we have shown that the SIN non- structural 
proteinases differ in their cleavage site preferences and we 
have provided evidence that these differences result in a 



temporal regulation of protein processing during infection. 
Several observations link the temporal regulation of the nsP4 
to P34 ratio to the regulation of RNA synthesis, but direct 
evidence for this has not yet been obtained. We hope to 
address this point by studying the kinetics of viral RNA 
synthesis for SIN mutants in which the 3/4 cleavage site has 
been rendered uncleavable or more efficient by site directed 
mutagenesis. 

Materials and methods 

Plasmids, enzymes and genera! methods 

pTolol 101 is a full-length cDNA clone of the HR strain of SUM from which 
infectious RNA can be transcribed in vitro (Rice et ai. t 1987). pTotolOOO.S, 
a derivative of the full-length clone in which the opal codon between nsP3 
and nsP4 has been replaced by a serine codon (Li and Rice, 1989), was 
kindly provided by C.M.Rice. pToto57, containing a unique Xbal site at 
position 54 of the SIN genome, was kindly supplied by R.J.Kuhn. pGEM5-Zf 
was obtained frm Promega Biotech. The plasmids were introduced in 
Escherichia coli strain MCI 06 1.1 by CaCI 2 transformation. E.coli strain 
BW313 was used for the preparation of uracil -containing template DNA 
used for in vitro mutagenesis (Kunkel, 1985). Restriction endonucleascs, 
T4 DNA ligase, T4 DNA polymerase, polynucleotide kinase, the Klenow 
fragment of Exoti DNA polymerase I, and SP6 RNA polymerase were 
from New England Biolabs. SI nuclease was from BRL. Modified T7 DNA 
polymerase (Sequenase) was from US Biochemical s. Taq DNA polymerase 
was from Promega. Both single stranded and double stranded DNA were 
sequenced using the T7 DNA polymerase (sequenase) dideoxy chain termina- 
tion method using conditions recommended by the manufacturer (US 
Biochemicals). Standard DNA manipulations and cloning procedures were 
done according to Maniatis ei at. (1982). 

Preparation of in vitro transcripts 

Plasmid DNA for in vitro transcription was prepared by alkaline lysis 
(Maniatis et ai t 1982). The DNA templates were linearized with Xho\ t 
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purified and used for in vitro transcription by SP6 RNA polymerase as 
described previously (Hardy and Strauss, 1989). 

In vitro translation 

In vitro translation was carried out in a 10^1 reaction containing nuclease 
treated, methionine depleted rabbit reticulocyte lysate (Promega) 
supplemented with 10 /iCi | 35 SImethionine (> 1000 Ci/mmol; Amersham), 
20 fiM of an unlabeled amino acid mixture lacking methionine (Promega) 
and 10-50 ng RNA. Incubation was for I h at 30 °C. The reactions were 
stopped by adding 40 p\ sample buffer (Laemmli, 1970). 

For mam-cleavage experiments, the polyproteins to be tested for proteinase 
activity were prepared by in vitro translation as described but in the absence 
of [ 35 S (methionine. Instead unlabeled methionine was added to a final 
concentration of 20 /iM. Synthesis of both substrate and enzymes was stopped 
by adding 6 mg/ml cycloheximide and 20 mM methionine to final concen- 
trations of 0.6 mg/ml and I mM, respectively. Subsequently, enzymes and 
substrate were mixed at a 1:1 (v/v) ratio and allowed to incubate for an 
additional 2 h at 30 °C. The translation products were analyzed by 
SDS-PAGE (Laemmli, 1970) in 7.5% px)lyacrylamide gels (aciylamide:6ty- 
acrylamide 75:1). The gels were fixed in 50% methanol, 12% acetic acid 
for 30 min, impregnated with EN3HANCE (Dupont) according to the 
instructions of the manufacturer, and autoradiographed. Densitometry was 
performed using a computing densitometer (Molecular Dynamics, model 
300 A) and the 1 Quant software. The relative amount of input substrate SI 234 
(determined for the blank) was compared with that of the cleavage products. 
A [ 35 SJMet-labeled endogenous protein of the reticulocyte lysate served as 
an internal control to assure that similar amounts of the undigested and 
digested S1234 had been loaded on the gel. 

Ptasmid constructs 

The SIN nsP2 protein is generated by proteolytic cleavage of a polyprotein 
precursor and therefore does not have an initiation or a termination codon. 
To construct a transcription/translation vector that could be used to express 
the nsP2 gene, the 714 bp Pst\-BamH\ fragment derived from pTotol 101 
(nucleotides 3W9 — 4663 of the SEN genome), was subcloned into M13mpl9. 
The Alal codon of nsP3 was converted into an amber stop codon by site 
directed mutagenesis (Kunkel, 1985) using the oligonucleotide primer 
5 ' -G ACGCCIAGGCTCC AACTCC ATCTCT-3 ' . The introduced nucleo- 
tide substitutions (underlined) also resulted in the creation of an AvrU site. 
The mutated Pst\~BamH\ fragment was then excised from RF DNA and 
inserted into an intermediate vector pSCV23, which contains the 2974 bp 
BglM-Spel fragment of pToioI IOl (nucleotides 2288-5262 of the SIN 
genome) (Shirako and Strauss, 1990), and which had been prepared by diges- 
tion with Pstl and BamW\. The resulting plasmid was called p6-8-2. To 
generate an initiation codon at the 5' end of nsP2, the 1 146 bp fnu4HI 
fragment from pTotollOl (nucleotides 1677-2823 of the SIN genome) 
was treated with SI nuclease followed by digestion with Clal. This 
Fnu4HI(blunt-ended)-C/fll fragment (nucleotides 1680—2712), was then 
joined to the Clal- AvrU fragment (nucleotides 2712 -4099 of the SIN 
genome) from p6-8-2 and to the Ncol (blunt-ended with K}enow)—Spel frag- 
ment from pGEM5-Zf in a three piece ligation. The resulting plasmid pGEM 
8-2 contains a complete nsP2 gene, with an initiation codon (supplied by 
the blunt-ended Ncol site) preceding Alal and an opal stop codon at its 
3' end. The reconstructed nsP2 gene was inserted downstream of the SP6 
promoter and the SIN 5' non-coding leader sequence in a four piece liga- 
tion involving the 2849 bp Xba\ (blunt-ended with Klenow)— Sail (posi- 
tion 1 1087) fragment of pToto57 and the following fragments of pGEM 
8-2: Nco\-Ban\ (nucleotides 1675-1902), Banl-Clal (nucleotides 
1902 - 2712) and Oal-Sall (nucleotides 2712-4KM; the Sail site was 
provided by the polylinker of pGEM 5-Zf). The nsP2 moiety of the resulting 
construct, pToto.2, was sequenced to ensure thai no additional mutations 
had arisen during the manipulations. The sequence obtained was identical 
to that of the parental clone pTotollOl. 

pPl-539E/P2-806E is a derivative of pTotol 101 in which the nsPl/2 and 
the nsP2/3 cleavage sites have been eliminated by changing them from Gly- 
Ala-AJa to Glu-Ala-Ala (Shirako and Strauss, 1990) (PI-539E signifies that 
amino acid 539 of nsPl has been changed to Glu, etc.). pPl-539E/P2-806E 
was cut with Clal (position 2712) and 5a/ 1 (position 11087) and ligated 
to the Clal-Sall fragment from pToto.2 to give plasmid pToto. 12, which 
encodes a non-cleavable P12 polyprotein. 

pToto.SA3 is a pTotol 101 derivative in which the nsP3/4 cleavage site 
has been destroyed by changing it to Gly-VaJ-Tyr (RJ.De Groot, unpublished 
results) and in which the opal to Ser change following nsP3 has been 
transferred from pTotolOOO.S. To create plasmid pToto.1234 encoding a 
non-cleavable PI 234 polyprotein. the Spel-BssHU fragment from pTotoSA3 
(nucleotides 5262-9804 of the SIN genome) was cloned into 



pPI-539E/P2-806E. Plasmid pToto.234, encoding non-cleavable P234, was 
constructed by cloning the 9028 bp Cla\~Xhol fragment of pToto.1234 
into pToto.2. 

To construct transcription/translation plasmids for PI 23 and P23, an 
additional stop codon was created immediately downstream from the 
opal stop codon following nsP3 (position 5750) in order lo eliminate the 
partial readthrough that occurs. A 458 bp fragment was produced by 
PCR amplification (Saiki et al. % 1988) using the oligonucleotides 
5 ' - ATG AC AGT AGC AAGGCTC ACTTT-3 ' (SIN nucleotides 5207 - 5230) 
and 5 '-GTCG ACTATCAGTATTCAGTCCTCCTGCTCCTG-3' (nucleo- 
tides 5633-5665; nucleotide substitutions are underlined) as primers and 
pTotol 101 as a template. The resulting fragment was cut with Spel and 
cloned into pToto. 1234 or pToto.234 which had been cut with Spel (posi- 
tion 5262) and Stul (position 10768), to produce pToto.123 or pToto.23, 
respectively. 

In vivo labeling of non-structural proteins 

Confluent monolayers of chicken embryo fibroblasts were infected at a 
multiplicity of 50 p.f.u./cell with Sindbis virus strain HR as described 
previously (Hardy and Strauss, 1988). After 60 min at 37 °C { 1 h post infec- 
tion (p.i.)] the inoculum was removed and the monolayers were washed 
with phosphate buffered saline lacking divalent cations (PBS) to remove 
unabsorbed virus. Eagle's Minimal Essential Medium, pre-warmed to 37 °C, 
containing 3% dialyzed fetal calf serum, 1 pg/ml actinomycin D, and 5 uM 
methionine (1/20 the normal concentration) was then added and incubation 
was continued at 37°C. At either 1 h 45 min p.i. or 4 h p.i. the medium 
was removed and the cells were washed with PBS, prewarmed to 37 °C, 
to remove any residual methionine. The cells were then labeled for 5 min 
in Eagle's methionine free medium supplemented with 80 /iCi/ml 
[ 35 S]methionine (> 1000 Ci/mM, Amersham Corp.). After the pulse the 
cells were lysed, either immediately or following a chase for various times 
in Eagle's medium containing 2 mM methionine. The preparation of whole 
cell ly sates and the immunoprecipitations were performed as described 
previously (Hardy and Strauss, 1988). The immunoprecipitated products 
were analyzed by electrophoresis on 7.5% SDS— polyacrylamide gels. 
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Proteinase 3C of poliovirus type 2 (Sabin) was expressed at 4% total protein in Escherichia coli. The protein 
was soluble and could be purified by a simple scheme. It was weakly active on the capsid precursor PI 
(expressed in vitro), which contains two cleavage sites. The products of processing of PI were 1ABC and ID 
(VPl). The activity was insensitive to Triton X-100. Crude extracts of cells infected with poliovirus type 1 
(Mahoney) gave strong processing and yielded 1AB (VPO), 1C (VP3), and ID in the same assay system but were 
sensitive to detergent. 3C from cell extracts that was separated from its precursors resembled the recombinant 
proteinase in its activity. Recombinant 3C cleaved the peptide dansyl-Glu-GIu-Glu-AIa-Met-Glu-Gln-Gly-Ue- 
Thr-Asn-Lys-NH 2 at the Gln-Gly bond. We conclude that 3C is merely the core of the Gln-Gly-cleaving activity 
which processes PI in vivo and that there is probably a hydrophobic contact between a larger 3C precursor and 
its PI substrate which allows the second processing reaction: 1ABC, ID — > 1AB, 1C, ID. 



Poliovirus is a picornavirus and is the best-studied repre- 
sentative of the genus Enterovirus, which includes a variety 
of serious pathogens. Enteroviruses are also broadly related 
to the rhinoviruses, which are etiological agents of the 
common cold. These genera have essentially the same 
genomic structure, and within this group results for a single 
virus are mostly of general significance. The genome of 
poliovirus is a single -7.5-kiIobase RNA molecule that 
includes an open reading frame large enough to encode a 
247-kilodalton polyprotcin (14). The polyprotein is cotrans- 
lationally processed by at least two viral proteinases, map- 
ping to the 3C and 2A regions of the viral genome (9, 29). 
Most of the processing sites are between Gin and Gly 
residues (14), and it has been shown (8) that the polypeptide 
sequence of the viral product 3C is essential for this process 
and furthermore that 3C releases itself from longer precur- 
sors (9) by cleavage at the two Gln-Gly bonds which flank 
the polypeptide. Two other polypeptides that contain all of 
the 3C sequence are present at high concentration in infected 
cell extracts. The most abundant is 3CD, which also contains 
the whole 3D polymerase sequence and appears to be a 
relatively stable precursor of both 3C and 3D. The other 
polypeptide, 3C\ is derived from 3CD by cleavage at a 
2A-specific site (Tyr-GIy) and contains a small segment of 
the 3D protein. It has not been shown previously that 3C 
itself is catalytically active, although it releases itself from 
small engineered precursors (9, 12). 

On the basis of a comparison of the encoded amino acid 
sequences of related viruses, the likely catalytic residues of 
3C were suggested (1). These results were in agreement with 
the identification of 3C from encephalomyocarditis virus 
(EMCV) as a cysteine proteinase by biochemical methods 
(22). Site-directed mutagenesis has been used to confirm the 
essential function of Cys-147 and His-161 (12). 

It appears that any evolutionary relationship between 3C 
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and nonviral proteinases (notably, cysteine proteinases in 
particular) is remote, although it has been suggested that 3C 
is homologous to the trypsinlike family of serine proteases 
(7; J. F. Bazan and R. J. Fletterick, Proc. Natl. Acad. Sci., 
in press). The apparently strong specificity of 3C and its 
homolog toward the PI residue (glutamine), of itself, is a 
remarkable feature in a cysteine proteinase. Together, these 
observations indicate that the active site of 3C is a suitable 
target for the design of antiviral drugs that would have a 
minimal effect on healthy proteolysis by host enzymes. 

Other workers have described expression systems for 3C, 
but it has been necessary to develop a trans assay for 3C 
activity to show that any purification scheme would yield 
active material for biochemical characterization and crystal- 
lography. Furthermore, it has been reported that 3C ex- 
pressed in Escherichia coli is insoluble. We (18) and others 
(32) have reported methods for assaying Gln-Gly cleavage 
activity (QG-ase) in infected cell extracts by /ra/ij-cleavage 
of the capsomer precursor PI (Fig. 3A), which we express in 
vitro. When infected cell extracts are used as the source of 
QG-ase, the products of processing of PI are as seen in vivo, 
namely, PI -» 1AB, 1C, ID. We used this assay for protein 
expressed in Escherichia coli. Ultimately, it will be more 
convenient to use a peptide substrate for 3C, and we 
demonstrate here that a peptide can be processed specifi- 
cally by the recombinant enzyme. 

MATERIALS AND METHODS 

Bacterial culture medium. M medium was M9 medium (17) 
containing 2 g of NH 4 CI per liter and supplemented with 2 g 
of Casamino acids per liter, 10 g of glucose per liter, 20 u,M 
ferric chloride per liter, and 0.1 g of sodium ampicillin per 
liter. 

Buffers. The pHs of all buffers were determined at 22°C. 
Buffer A was 100 mM NaCMO mM Tris hydrochloride (pH 
7.9). Buffer B was 100 mM KCI-20 mM Tris hydrochloride- 
10 mM MgCI 2 -5 mM dithiothreitol (DTTM mM EDTA (pH 
7.9). Buffer C was 40 mM Tris hydrochloride-1 mM DTT 
(pH 7.9). Buffer D was 100 mM NaCI-20 mM 7V-2-hydroxy- 
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ethy!piperazine-N'-2-ethanesulfonic acid (HEPES)-NaOH- 
1 mM EDTA-1 mM DTT (pH 7.4). Buffer E was 10 mM Tris 
hydrochloride-10 mM NaO-1 mM MgCl 2 (pH 7.4). Buffer F 
was 10% (vol/vol) glycerol-100 mM KO-10 mM HEPES- 
KOH-1 mM EDTA-1 mM DTT (pH 8.0). Buffer G was 140 
mM KCI-20 mM HEPES-KOH-2 mM DTT-5 mM EDTA 
(pH 7.2), Buffer S (lx) was 12% (vol/vol) glycerol-76 mM 
Tris hydrochloride-4 mM Tris-25 mM DTT-1% sodium 
dodecyl sulfate. 

Gel electrophoresis of protein samples. Samples were pre- 
pared in buffer S. All electrophoresis was performed in 0.1% 
sodium dodecyl sulfate. Except in the analysis of PI proc- 
essing, all gels were 10 to 20% polyacrylamide gradient gels 
(40 parts acrylamide to 1 part methylenebisacrylamide [bis]). 
Otherwise, gels were homogeneous 15% acrylamide gels 
(175 parts acrylamide to 1 part bis). The buffer system used 
has been described previously (18). 

Preparation of substrate peptide dansyl-EEEAMEQGIT 
NK-NH 2 . The substrate peptide dansyl-EEEAMEQGITNK- 
NH 2 , of which the first 11 amino acids correspond to the 
cleavage site between poliovirus polypeptides 2A and 2B, 
was synthesized with an automatic synthesizer (SAM2; 
Biosearch) on melhylbenzhydrylamine resin by standard 
peptide chemistry. The dansyl group was coupled to the 
peptide on the resin before HF cleavage by treating the free 
amino terminus with excess dansyl chloride. The peptide 
was purified to homogeneity by reverse-phase High-pressure 
liquid chromatography. 

Construction of 3C expression plasmid pMN35. The T7 
expression vector pAR2113 (24) and the expression strain E. 
coli BL2KDE3) (26) were kind gifts of J. Dunn and F. W. 
Studier (Brookhaven National Laboratory, Upton, N.Y.). 
Restriction endonucleases were obtained from New England 
BioLabs, Inc., and T4 DNA ligase was obtained from 
Bethesda Research Laboratories, Inc. Standard procedures 
were used in the construction of plasmids. Oligonucleotides 
were synthesized on a Microsyn 1450A apparatus (Systec 
Inc.). Poliovirus 3C cDNA was derived from a partial cDNA 
clone [pVS(2)2501] of the Sabin strain of poliovirus type 2 
(28). 

Preparation of bacterial paste. E. coli BL2KDE3) was 
transformed with plasmid pMN35. After 12 h of incubation 
at 37°C, a colony was picked to inoculate 50 ml of pre- 
warmed Lennox broth and grown with shaking at 300 rpm at 
37°C, After about 3.5 h, the culture (0.5 A^) was added to 
0.5 liter of M medium in a 2-liter flask and incubated as 
before for about 4 h, after which the A^ reached 0.5 again. 
The culture was injected into 9.5 liters of M medium and 
grown with stirring (900 rpm) and aeration (16 liters/min) in 
a 14-liter fermentor (Microferm II; New Brunswick Scien- 
tific Co., Inc.). The pH was maintained between 7.3 and 7.8 
by addition of 10 M NaOH. When the A^ of a 1/5 dilution 
of culture (into 0;1 M NaCl) reached 0.7, expression was 
induced by . adding 0.4 mM isopropyl-p-D-thiogalactopy- 
ranoside. After 2.5 h, the culture was harvested. Crushed ice 
(5 kg) was added to the culture with stirring. The bacteria 
were collected by centrifugation at 3,500 x g for 15 min. The 
pellets were suspended in 1 liter of buffer A and recentri- 
fuged. The final pellet was stored at ~80°C. 

Purification of recombinant 3C. All procedures for purifi- 
cation of recombinant 3C were performed in a room refrig- 
erated to 4°C, unless stated otherwise. Bacterial cell paste 
(15 g) was thawed and suspended to 40 ml with buffer A. 
Bacteria were lysed by two passages through a French 
pressure cell at 70 MPa. The lysate was mixed with a further 
40 ml of buffer B (fraction 1) and centrifuged for 2 h at 



360,000 x g in a Beckmann 60 Ti rotor (at 59,000 rpm) to 
yield a clear supernatant (70 ml), which was decanted and 
diluted to 200 ml with water. A 1.2-ml volume of 1 M Tris 
base was added dropwise with stirring (fraction 2). The final 
pH of a sample at room temperature was 8.3. This solution 
was layered on a column (180 ml) of DEAE-cellulose 
(Whatman DE-52) that had been preequilibrated with buffer 
C. The sample was eluted with buffer C. The first 130 ml of 
effluent was discarded, and the next 320 ml was collected. A 
2.5-mI volume of 0.5 M EDTA-Na 3 H and 5 ml of 0.5 M 
morpholinoethanesulfonic acid were added dropwise with 
stirring (fraction 3). Ammonium sulfate was added, with 
stirring, at 0.55 g/ml to fraction 3 over 15 min, and precipi- 
tation was allowed to proceed for 12 h. Precipitated protein 
was collected by centrifugation at 6,000 x g for 60 min at 
8°C. The supernatant was discarded, and the pellet was 
carefully drained and suspended in buffer D to less than 4 ml 
(fraction 4B). Insoluble material was removed by centrifu- 
gation at 4,000 x g. To concentrate the protein, 2 g of solid 
ammonium sulfate was added with gentle agitation until 
dissolved. The mixture was allowed to stand at 0°C for 30 
min, and then the precipitate was collected by centrifugation 
at 10,000 x g at 4°C for 30 min. The pellet was drained, and 
the precipitate was redissolved in 1 ml of buffer D (fraction 
5). In a separate purification, fraction 5 was run on a column 
(55 by 1.6 cm) of superfine Sephadex G-75 (Pharmacia) in 
buffer D at 6 ml/h. Peak fractions (10.5 ml total) were 
identified by electrophoresis and pooled. Protein was col- 
lected by ammonium sulfate precipitation (0.5 g added per 
ml), followed, after 2 h, by centrifugation at 10,000 x g for 30 
min. The final pellet (fraction 6) was redissolved to about 1 
ml in buffer D. When purified, the concentration of 3C was 
estimated by measuring the A 280 . The protein contains no 
tryptophan and seven tyrosine residues per molecule of M r 
20,000; thus, the extinction coefficient at 280 nm is about 
8,400 M" 1 cm -1 and the A 280 0.1% value is about 0.42 M" 1 
cm" 1 (6). 

Preparation and fractionation of infected HeLa cell lysate. 
HeLa R19 spinner cells (1.2 x 10 9 ) were infected with 
poliovirus type 1 (Mahoney) at 100 PFU per cell, as de- 
scribed previously (8). At 4 h later, the cells were swollen in 
10 ml of buffer E at 0°C. Cells were broken by 15 strokes of 
a tight-fitting Dounce homogenizes The homogcnizer was 
drained and rinsed with 2 ml of the same buffer which was 
pooled with the lysate. The lysate (15 ml) was centrifuged at 
10,000 x g for 30 min. The supernatant (12 ml) was collected 
(S-10). A 10-ml volume of S-10 was centrifuged at 300,000 x 
g for 60 min. The remaining 2 ml of S-10 was mixed with 0.5 
ml of 50% (vol/vol) glycerol and stored at — 80°C. The second 
supernatant (S-300; 9.5 ml) was mixed with 2.5 ml of 50% 
glycerol. A 10-ml volume of this fraction was mixed with 4 g 
of ammonium sulfate. The precipitate was collected by 
centrifugation at 10,000 x g for 30 rhin and suspended to 1.5 
ml in buffer F. Insoluble material was removed by centrifu- 
gation at 10,000 x g for 5 min. A portion (0.2 ml) of the 
supernatant was loaded on a Superose 12 HR 10/30 column 
(Pharmacia) and eluted at 0.2 ml/min in buffer F. Fractions 
(0.5 ml each) were collected up to 24 ml. Each fraction 
between elution volumes 11 and 17 ml was concentrated to 
approximately 75 jil by centrifugal ultrafiltration for 2 h at 
5,000 x g (Centricon; Amicon Corp.). 

Correlated assays on PI and peptide cleavage. Polypeptide 
PI was supplied in reticulocyte lysate translation mixtures, 
as described previously (18), in which 40 g of pMN22 RNA 
per ml had been translated at 30°C for 60 min in the presence 
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of 1 mCi of [ 35 S]methionine per ml, followed by addition of 
nonradioactive methionine to 50 \iM. 

To test recombinant 3C with thiol-reactive inhibitors (E-64 
and 1,3-dibromoacetone), the 3C (0.1 ml of a 0.5 mM 
solution in buffer D) was treated with fresh reducing agent 
(10 mM DTT), incubated at 30°C for 10 min, and then 
separated from free reducing agent by gel filtration into 
buffer G without DTT at 4°C. The effluent which did not 
contain low-molecular-weight thiol was pooled and diluted 
to 0.5 ml. To 90 pJ of the enzyme solution (about 100 p.M) 
was added 10 \x,\ of buffer G without DTT (control) or a fresh 
solution of E-64 (to give 0.2 mM), dibromoacetone (to give 
0.15 mM), or Triton X-100 (to give 1%). The mixtures were 
incubated at 30°C for 15 min. From each reaction mixture, 10 
fxt was withdrawn and added to 20 u,l of buffer G-10 mM 
DTT, except for the final mixture, which was added to buffer 
G-10 mM DTT-1% Triton X-100. In parallel experiments, 3- 
and 10-jxl volumes of infected S-10 were diluted to 30 \x\ with 
buffer G-10 mM DTT and a further 10 |xl of infected S-10 was 
diluted with buffer G-10 mM DTT-1% Triton X-100. PI 
translation mixture (4 was then added with mixing, and 
the samples were incubated at 30°C for 2 h. Protein was 
precipitated by thorough admixture of 0.3 ml of acetone and 
collected by centrifugation; it was then drained and dried 
under vacuum. Pellets were redissolved in buffer S and 
analyzed by electrophoresis in 15% acrylamide gels. Gels 
were fluorographed with En 3 Hance as described by the 
manufacturer (New England Nuclear Corp.) and used to 
expose preflashed X-ray film (/4 350 = 0.2) for 18 h at -80°C. 
Densitometry was performed on an LKB gel scanner. 

Meanwhile, 10 uJ of DTT solution (to give 10 mM) was 
added to the remaining 90 \i\ of the enzyme-inhibitor mix- 
tures, followed by 10 nJ of the peptide substrate solution 
(0.83 mM). These mixtures were also incubated for 2 h at 
30°C. At the end of the incubation, the mixture was diluted 
10-fold with water and analyzed on Mono-Q (see Fig. 5B, 
legend). 

Quantification of protein products. The integrated absorb- 
ance of the PI, 1ABC, ID, and, in some cases, 1C peaks was 
recorded. To correct for errors in loading and preparation of 
the reactions and deviations in lane width, the values of PI, 
1ABC, and ID were summed and then the fraction of the 
sum which each band represented was calculated. To con- 
vert to relative molar quantities (*,) for each polypeptide (0, 
these values were divided by the number of methionine 
residues in each protein. In this construction, ID contains 5, 
1ABC contains 20, and PI contains 25 Met residues. 

RESULTS 

Expression of proteinase 3C in BL21(DE3). The cDNA 
segment encoding the 3C region of poliovirus type 2 (Sabin 
strain) (28) was placed in the vector pAR2113 (24) under 
control of aT7 promoter. The initiation codon of T7 gene 10 
was immediately followed by the coding sequence of 3C, 
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FIG. 1. Analysis on sodium dodecyl sulfate-polyacrylamide gel 
electrophoresis of biosynthesis of 3C in batch fermentation. Sam- 
ples were withdrawn from the fermentor immediately before induc- 
tion (lane U) and 2.5 h after induction (lane I) of expression with 
isopropyl-p-D-thiogalactopyranoside. Lanes U and I were each 
loaded with the protein derived from a 10-liter culture. The arrow in 
lane I indicates the expressed 3C. Protein was visualized by staining 
with Brilliant Blue R. Lane M contained molecular size markers. 
The numbers on the right indicate apparent molecular size (in 
kilodaltons). 



followed by a termination codon. The completed plasmid, 
pMN35, was transferred into the expression strain, BL21 
(DE3). The expression system is described in detail else- 
where (24, 26). Briefly, the plasmid contains a T7 promoter 
inserted into the BamUl site of pBR322, which directs 
transcription by T7 RNA polymerase, the gene 1 product, 
toward the promoter of the bla gene. Gene 1 itself is inserted 
into the host chromosome, where its expression is controlled 
by the lac U V5 promoter, and is thus inducible by isopropyl- 
P-D-thiogalactopyranoside. Induction of gene 1 leads to 
rapid and overwhelming transcription of plasmid-specific 
RNA. 

To prepare enough cell mass to purify recombinant 3C 
preparatively, a benchtop fermentor was used as described 
in Materials and Methods. Figure 1 illustrates the induction 
of expression. 

Purification of recombinant 3C. Since the yield of 3C was 
not very great, a simple purification scheme (detailed in 
Materials and Methods; the results are shown in Table 1 and 
Fig. 2) was required that could be readily repeated to yield 
sufficient material for X-ray crystallographic analysis. Only 
5% of all the protein extracted from bacteria was not 
adsorbed by DEAE-cellulose (Table.l; Fig. 2, compare lanes 
2 and 3). Roughly 50% of this was 3C. The major contami- 
nant was not precipitated by ammonium sulfate (Fig. 2, 



Fraction no. 

1 

2 
3 
5 



TABLE 1. Purification of biosynthetic 3C from 15 g of E . coli BL21 (DE3) (pMN35) cell paste 



Description 



Vol (ml) 



Crude lysate 
S-360 

DEAE-cellulose effluent 

Protein in second (NH 4 ) 2 S0 4 precipitation 



80 
200 
320 
1.3 



Concn (mg/ml) 



Total protein (mg) 



8.7° 
1.2° 
0.03° 
3.4°; 6.8' 



690" 
240° 
10° 

4.3°; 8.9 r 



° Determined by Bio-Rad protein assay (dye binding). 

* Calculated from 3C yield in fraction 5. total protein in fraction 1, and densitometry scan of Fig. 2, lane 1, 
r Determined by A 2S0 from the estimated extinction coefficient (see Materials and Methods). 



% yield 



100 
65 
50 
29* 



% purily. 



4 
8 
50 
90 
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FIG, 2. Purification of recombinant 3C. For lanes 1 to 5a; a fixed 
proportion (1/10,000) of certain fractions (as indicated) was loaded in 
buffer S. Lanes 5b and 5c were loaded with 4 and 10 ng (by A 2S0 ) y 
respectively, of fraction 5. Lane 6 contained 10 u.g of fraction 6. 
Protein was visualized by staining with Brilliant Blue R. The 
numbers on the right (M) indicate apparent molecular sizes (in 
kilodaltons). 



compare lanes 3 and 4). After reprecipitation from a small 
volume, 3C was 90% pure by densitometry (Fig. 2, lanes 5a, 
b, and c), and material of this quality was used in most of our 
experiments. Ninety-eight percent pure material was ob- 
tained by adding a single step of molecular size chromatog- 
raphy (Fig. 2, lane 6) with around 90% recovery. The overall 
recovery of protein was 30%, and one can isolate 8 mg of 3C 
from 15 g of wet E. coli cell paste. 

The N-terminal sequence of the protein product was 
determined up to residue 10 by automated gas phase Edman 
degradation. The size of signal was commensurate with the 
sample loaded, and the data were unambiguous and did not 
deteriorate appreciably over the 10 cycles. The sequence 
determined was Met-Gly-Pro-Gly-Phe-Asp-Tyr-Ala-Val- 
Ala, i.e., the predicted amino terminus of 3C but with an 
additional methionine. Some of the purified 3C was used to 
raise antibodies in rabbits. The ant i sera strongly recognized 
3C from cells infected with type 1 poliovirus (see Fig. 4A; 
compare lanes 31 and 32 with the recombinant 3C standards 
on the right of the blot). 

Activity of recombinant 3C on capsid precursor PI ex- 
pressed in vitro. In preliminary experiments, crude extracts 
were prepared from induced BL21(DE3)(pMN35) and 
BL21(DE3)(pMN42), of which the latter plasmid leads to 
expression of the 3C protein from EMCV. No specific 
cleavage of poliovirus PI, expressed in vitro, was obtained 
with the latter expression system, but it was highly active 
against the EMCV 3C substrate, LVP0 (data not shown). 
The precursor polypeptide LVP0 of EMCV was also ex- 
pressed in vitro by us, as previously described (20). Proc- 
essing of poliovirus PI by the bacterial lysate containing 
poliovirus 3C (the former) was qualitatively the same as that 
observed later with the purified protein (data not shown). 

We used in vitro translation products of synthetic mRNA 
encoding the capsomer precursor, PI, derived from poliovi- 
rus type 1 (Mahoney) cDN A (18) to monitor QG-ase activity. 
After 2 h of incubation in buffer, no apparent change in the 
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FIG. 3. Proteolytic processing of poliovirus proteins. (A) Sche- 
matic representation of the poliovirus polyprolein (heavy line). 
Individual polypeptides (not drawn to scale) are designated accord- 
ing to the nomenclature of Rueckert and Wimmer (25). The poly- 
protein is divided conceptually into the following three domains: PI, 
the capsid region which is eventually processed into the four capsid 
proteins 1A, IB, 1C, and ID (also called VP4, VP2, VP3, and VP1, 
respectively); P2, which yields 2A, 2B (not indicated), and 2C; and 
P3, which yields 3A, 3B (VPg; not indicated), 3C, 3D (the polymer- 
ase), and a variety of alternative cleavage products. Proteinase 2A 
(29) (and its precursors) cleaves at specific Tyr-Gly pairs (the P1-P2 
site and a site in 3D). With the exception of the 1A-1B cleavage site, 
all other cleavages are made by 3C or its precursors (9). For a 
complete processing map, see Pallansch et al. (19). Note that 1AB is 
cleaved to 1A and IB when the viral RNA is encapsidated and 
probably involves neither 2A nor 3C (2). The reaction has not been 
reproduced in vitro, (B, left panel) Comparison of processing of PI, 
expressed in vitro, by crude extracts of poliovirus-infected cells and 
by purified recombinant 3C. See Materials and Methods for condi- 
tions. Lanes: 1, no QG-ase; 2, no QG-ase and 1% Triton X-100; 3, 3 
uJ of infected S-10; 4, 10 pi of infected S-10; 5, 10 jil of S-10 and 1% 
Triton X-100; 6, about 25 u,M 3C; 7, about 25 u.M 3C and 1% Triton 
X-100; 8, extract of infected cells labeled for 3 h postinfection. 1AB, 
1C, and ID are indicated with arrows. (B, right panel) Effect of 
preincubation with thiol-reactive inhibitors on about 0.1 mM recom- 
binant 3C for 15 min. Lanes: 9, 0.2 mM E-64; 10, 0.15 mM 
1,3-dibromoacetone; 11, no inhibitor; 12. infected S-10 as in lane 4; 
13, markers (as in lane 8). The proteinase was diluted to about 25 
M.M in the reaction mixture. 



products of the translation was observed. Figure 3B, lane 1, 
shows the products of the control incubation. In lanes 3 and 
4, infected S-10, 3 and 10 u.1, respectively, was used as the 
source of QG-ase. As reported previously (18), processing 
yielded the three capsomer proteins 1AB, 1C, and ID. It is 
clear that the comparative rates of processing at the two 
cleavage sites are similar, since all three capsomer proteins 
were observed even while processing of PI was partial (lane 
3). In lane 6, about 25 u.M purified biosynthetic 3C was 
included in the incubation, giving almost complete cleavage 
of PI but yielding only 1ABC and ID, with no detectable 
cleavage of 1ABC to 1AB and 1C. We were able to detect a 
trace of 1C (identified by its mobility on gel electrophoresis) 
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FIG. 4. Fractionation of soluble QG-ase from poliovirus-infected 
cells on fast-protein liquid chromatography with Superose 12 HR10/ 
30 (bed volume, 23.6 ml). The sample size was 0.2 ml, and fractions 
were 0.5 ml each but were concentrated to 75 p.1 (see Materials and 
Methods). Effluent from 11 to 17 ml (fractions 23 to 34) was analyzed 
in detail. (A) Immunoblotting of 10 p.1 of each fraction (after sodium 
dodecyl sutfate-polyacrylamide gel electrophoresis) was performed 
as described by Krausslich et al. (16), except that the transfer was 
for 2 h only at 0.2 A. Rabbit anti-3C serum raised against the 
recombinant protein was used at a 1/300 dilution.. Detection was 
with goat anti-rabbit immunoglobulin G coupled to alkaline phos- 
phatase, with indolyl phosphate-Nit roblue Tetrazolium chloride as 
the indicator substrate system. The positions of the 3C-related 
proteins in the infected S-10 are labeled at the right margin. (B) A 
10-jtI volume of each fraction was incubated with 2 m-1 of PI 
translation mixture for 3 h at 30*C. The gel was prepared, fluoro- 
graphed, and scanned as described in the text, and the extent of 
cleavage (0 to 1) to yield ID (open circles) and the molar ratio of 1C/ 
ID (filled circles) were calculated. The arrows indicate the ordinates 
corresponding to open or filled circles. 

at still higher 3C concentrations (data not shown). We 
estimate that the concentration of 3C-related proteins in the 
reaction mixture analyzed in lane 3 was about 1/100 of that 
present in the mixture analyzed in lane 6 (compare the S-10 
lane with the 3C standards in Fig. 4A). Although recombi- 
nant 3C was active against PI, its activity was apparently 
low and restricted largely to cleavage of the 1ABC-1D bond. 
In separate experiments, we were able to determine the rate 
constant, k c JK m1 to be 80 M" 1 S" 1 for the interaction of 3C 
with PI (data not shown). 

When the same experiments were performed in the pres- 
ence of 1% Triton X-100, no qualitative difference could be 
observed (Fig. 3B, compare lanes 6 and 7) when recombi- 
nant 3C was used as the QG-ase. Under the same conditions. 



the QG-ase of S-10 was completely inhibited (compare lanes 
4 and 5). 

Recombinant 3C and 3C from infected cells have identical 
activities. Recombinant 3C (unlike the QG-ase from infected 
cells) does not cleave PI of poliovirus type 1 rapidly to yield 
1AB, 1C, and ID; instead, it yields only 1ABC and ID (Fig. 
3). The purified recombinant 3C is derived from the poliovi- 
rus type 2 genome and differs from the 3C in cells infected 
with poliovirus type 1 (Mahoney) at three amino acid posi- 
tions. This should have no effect on PI processing, however, 
because intertypic recombinants of poliovirus readily grow 
in tissue culture (23). The recombinant 3C also retains the 
N-terminal methionine which results from the synthetic 
initiation codon in the expression vector. We therefore 
partially purified 3C and its precursors from the soluble 
fraction of infected cell extracts to determine their proper- 
ties. 3C has been identified as the only polioviral protein 
which does not sediment mainly with membranes (27). 
Soluble proteins in the S-300 of cell lysate were concentrated 
by ammonium sulfate precipitation and fractionated by mo- 
lecular size. In a preliminary experiment, the size range over 
which the bulk of QG-ase was eluted was determined. A 
finer analysis was made with three assays (Fig. 4) over the 
range of QG-ase as follows, (i) We performed an immuno- 
blotting analysis (Fig. 4A) with anti-3C serum, (ii) We 
determined the proportion of PI that was cleaved to release 
ID (Fig. 4B, open circles) in an assay, (iii) We calculated the 
molar ratio of 1C to ID {x lc fx lD ; Fig. 4B, filled circles). 
Since the measurements involved in calculating the third set 
of data were complicated, large proportional experimental 
errors were expected; thus, the difference in x iC /x xl> be- 
tween fractions 28 and 30 is probably not significant. The 
fractions which contained the peak of QG-ase activity (26 to 
28) in the PI assay did not contain 3C but did contain 3C\ 
3CD, and possibly other 3C-related precursors. 3CD was 
also detected in fraction 30 in an immunoblot treated with 10 
times more concentrated anti-3C serum. It is also clear that 
fractions 31 and 32 contain abundant 3C and undetectable 
levels of 3C precursors and generate very little 1C in a PI 
cleavage assay, despite extensive processing to yield 1ABC 
and ID. The activity of the 3C (which is approximately 1 
u.M) in these fractions therefore accords well both quantita- 
tively and qualitatively with that of recombinant 3C. 

A peptide substrate specifically processed by 3C. Two 
procedures were used to detect the products of cleavage of 
the fluorescent peptide dansyl-EEE AMEQGITN K-N H 2 by 
recombinant 3C. In one approach (Fig. 5A and B), the 
peptide was subjected to extended digestion (50 u,M peptide- 
12.5 u.M 3C in buffer D for 14 h at 30°C), and all of the new 
UV-absorbing products which did not arise from the buffer 
alone were isolated by high-pressure liquid chromatography 
and analyzed by fast-atom bombardment mass spectroscopy 
(data not shown). Total conversion of the parent material 
(Fig. 5A, peak 1) to peaks 2 and 3 (Fig. 5B) was observed. 
The molecular masses of peaks 2 and 3 are consistent with 
the expected masses of dansyl-EEEAMEQ and GITNK- 
NH 2 , respectively, the two products of cleavage at the 
GIn-Gly bond. 

In separate experiments, the peptide (83 u.M peptide-50 
H.M 3C in buffer D for 2 h at 30°C) was approximately 50% 
converted by recombinant 3C to a new fluorescent material 
with affinity for Mono-Q greater than that of the parent 
peptide (Fig. 5C and D and legend). After 2 h of incubation, 
0.5 ml of acetone at room temperature was added to an 
identical reaction mixture to denature the protein and stop 
the reaction. The reaction mixture was dried in vacuo, and 
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FIG. 5. Processing of the peptide dansyl-EEEAMEQGITNK-NH 2 by biosynthetic 3C. (A) Reverse-phase analysis of the substrate 
peptide. (B) Analysis of cleavage reaction. The peptide (50 |*M) was incubated with 12.5 3C for 14 h at 30°C in a final volume of 0.1 ml 
of buffer C. Samples were run on a Vydac C-18 column in a linear 60- ml gradient at 2 ml/min, containing 5 to 100% solution B mixed in solution 
A. Solution A was 0.1% trifluoroacetic acid in water, and solution B was 0.1% trifluoroacetic acid in 40% water-60% acetonitrile. The peak 
fractions were pooled and analyzed by fast-atom bombardment mass spectroscopy. Peak 1 yielded a molecular ion of M r 1,610.6, as expected 
of the uncleaved peptide. Peak 2 yielded a molecular ion of M r 1,098, as expected of the peptide dansyl-EEEAMEQ. Peptide 3 yielded a 
molecular ion of M r 531, as expected of GITNK-NH 2 . Peaks 1' and 2' yielded molecular ions of Mj> 1,626.7 and 1,114, respectively, as 
expected from the methionine sulfoxide derivatives of peaks 1 and 2. (C and D) Ion-exchange analysis. The peptide (83 u.M) was incubated 
in buffer G with either no enzyme (C) or 50 u-M 3C in 0.1 ml (D) for 2 h. The mixtures were diluted 10-fold and resolved on fast-protein liquid 
chromatography with Mono-Q HR5/5. After sample application, the column was washed with 3 ml of 20 mM bis-Tris hydrochloride (pH 6.7), 
followed by a 17-ml gradient from 0 to 50% 20 mM bis-Tris hydrochloride-1 M NaCl (pH 6.7) mixed with 20 mM bis-Tris hydrochloride (pH 
6.7). The fluorescent fractions were detected by standing the tubes over a 330-nm transilluminator and are indicated by horizontal bars labeled 
F. The reaction mixture was also analyzed as described in Results. 



the dry pellet was extracted with dimethylformamide and 
centrifuged, and the solution was transferred to a new tube 
for drying in vacuo. A portion of this material was analyzed 
by gas phase automated Edman degradation for six cycles. 
The sequence Gly-IIe-Thr-Asn-Lys was obtained (data not 
shown), with no amino acid derivative detected in cycle 6. 
Small contaminating peaks (running with the derivatives of 
Asp and Ala in cycle 1, with Gin in cycle 2, and with He in 
cycle 3) did not represent any other sequence within the 
substrate peptide. Together, these data show that the en- 
zyme preparation cleaved the synthetic peptide at the Gln- 
Gly cleavage site. 

Correlation of activity of inhibitors in PI and peptide 
assays. In separate experiments, recombinant 3C (at approx- 
imately 90 y,M) was pretreated with the two cysteine pro- 
teinase inhibitors L-/ran5-expoxysuccinylleucylamido(4-gua- 
nidinobutane) (E-64; 0.2 mM [3]) and 1 ,3-dibromoacetone 
(0.15 mM [11]). A control reaction contained only enzyme 
and buffer (see Material^ and Methods). After addition of 
excess thiol, the mixtures were tested for activity in both the 
PI and peptide assays. In the control assays, PI was almost 
completely converted to 1ABC and ID (Fig. 3B, lane 11) and 
the fluorescent peptide was detected under UV illumination 
as running almost completely in the position of the product 
peak, as in Fig. 5D (data not shown). The pattern of neither 



assay was changed (Fig. 3B, lane 9) by preincubation of 3C 
with E-64; hence, E-64 does not appear to inhibit the enzyme 
under these conditions. However, after treatment with di- 
bromoacetone (Fig. 3B, lane 10), PI was only weakly 
cleaved, whereas in fast-protein liquid chromatography 
roughly half of the fluorescent peptide migrated in the 
position of the unmodified peptide and half migrated in the 
position of the processed peptide (data not shown). Thus, 
dibromoacetone appeared to inhibit the activity of 3C in both 
reactions. 

DISCUSSION 

Expression of 3C in E. coti. Ivanoff et al. (12) have 
previously reported efficient expression (15% of total pro- 
tein) of type 1 (Mahoney) poliovirus 3C from a trp promoter 
in E. coli, but the product was described as largely insoluble. 
Large amounts of internal initiation products are observed 
when poliovirus type 1 cDNA is used for expression of 3C 
(9, 12; M.J. H.N. and E.W., unpublished data). We ex- 
pressed 3C derived from poliovirus type 2 (Sabin) cDNA, 
which does not give internal initiation in the 3C region, 
probably because of a single-base difference upstream of 
Met-27. The level of expression of the protein product of our 
system was only 4% by densitometry of a stained gel (Fig. 1, 
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compare lanes U and 1), but it was largely extracted into a 
high-speed supernatant during purification (Fig. 2, lanes 1 
and 2) and was easily purified. Other workers have also 
recently expressed type 2 3C, derived from the same cDNA 
clone, from an E. coli promoter (A. Nomoto, personal 
communication). However, like Ivanoff et al., they have 
found that their product is insoluble but active after renatur- 
ation. 

The N-terminal methionine of our recombinant protein 
was unexpectedly retained (4, 30). However, as described 
below, the recombinant and the natural 3C seemed compa- 
rable in activity. 

The processing reaction. Jackson (13) has shown in syn- 
chronized in vitro translations of EMCV RNA that the 
precursors of 3C are active in cleaving the capsid precursor 
in this system. EMCV 3C, however, is fully active upon the 
complex EMCV capsid precursor (20). There has previously 
been no clear report that 3C of poliovirus alone is a QG-ase. 
Here we show that recombinant 3C is proteolytically active. 
However, pure 3C cleaves only one of the two peptide bonds 
in PI which are cleaved in vitro by the QG-ase from infected 
cell extracts (Fig. 3B) and are readily processed in vivo. 
Ypma-Wong and Semler (32) have reported previously that 
on translation in vitro (where the concentration of enzyme 
and substrate would be in the range of 10~ 8 M), the entire 
3CD region must be translated for processing of PI (synthe- 
sized from the same mRNA) to be observed, whereas the P2 
and P3 regions apparently require only the 3C region itself. 
Our kinetic data (M.J. H.N. and E.W., unpublished data) 
show that 3C would need to be 2 orders of magnitude more 
concentrated to produce detectable processing of PI in this 
system. More recent work, in which PI, P2, and P3 sub- 
strates were translated independently of the proteinase ac- 
tivity, has supported these findings (13a, 31a; H.-G. 
Krausslich, C. Hellen, M. J. H. Nicklin, and E. Wimmer, 
unpublished data). It is likely, therefore, that the K m of 3C 
for the P2 and P3 substrates is much lower than that for PI. 
We have shown that soluble precursors of 3C, when partially 
purified from infected cells, are highly active on PI, although 
3C itself is apparently absent from these fractions. These 
precursors cleave PI fully to 1AB, 1C, and ID. The 3C from 
infected cells shows weak activity similar to that of recom- 
binant 3C, yielding 1ABC and ID. The implication of these 
findings is that 3C lacks a domain required for effective 
interaction with PI. Our observation that the QG-ase from 
infected cells is rendered inactive (or merely as inactive as 
3C) upon the PI substrate by inclusion of detergent in the 
reaction mixture, whereas high concentrations of 3C are 
unaffected, suggests that the interaction of .the functional 
QG-ase and PI is mediated by a hydrophobic interaction. In 
this respect it would be relatively simple to test for the 
involvement of the N-terminal myristoyl moiety of PI (5, 
21). 

It has been reported that 3CD is not active as a polymerase 
(31); hence, the processing (by the QG-ase) of 3CD to 3Cand 
3D represents a potential regulatory step in which the PI 
QG-ase is inactivated and the polymerase is activated. 

Poliovirus 3C as a peptidase. Clearly, the most desirable 
form of a'ssay for a proteinase is a peptidolytic assay. It 
would also open up a convenient route for quantitative 
studies of the primary specificity of the enzyme and the 
investigation of inhibitors. An esterolytic assay of poliovirus 
proteinase 2A has been reported (15), but none has been 
reported for 3C. The peptide sequence EEEAMEQGITN 
was selected to represent the QG-ase-cIeavable site between 
polypeptides 2A and 2B, which is efficiently cleaved in vivo. 



The peptide synthesized was dansyl-EEEAMEQGITNK- 
NH 2 . The reaction catalyzed by 3C yielded two peptides 
with molecular masses corresponding to those of the prod- 
ucts of cleavage between Gin and Gly, and amino-terminal 
sequencing of all dimethylformamide-soluble peptides 
present in the mixture yielded a single N-terminal sequence 
corresponding to the C-terminal product, GITNK, as ex- 
pected from the GIn-Gly specificity of the enzyme on its 
protein substrates. Synthetic peptides not containing a 
proper QG site were entirely resistant to even large amounts 
of 3C (P.V.P. and E.W., unpublished data). E-64, a highly 
specific, irreversible inhibitor of the papain superfamily 
(including many lysosomal cysteine proteinases) (3), appar- 
ently did not react with recombinant 3C to inhibit its activity 
in either the peptidolytic or the PI cleavage assay. Dibro- 
moacetone, on the other hand, inhibited the enzyme corre- 
spondingly. The latter inhibitor is not likely to exhibit much 
specificity, although it may cross-link the active-site Cys and 
His residues (11). 
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Communicated by David Baltimore, November 7, 1984 

ABSTRACT We have purified from Moloney murine leu- 
kemia virus (Mo-MuLV) a protease that has the capacity of 
accurately cleaving the polyprotein precursor PrfS*" 8 into the 
mature viral structural proteins. Both the NH 2 - and COOH- 
tenninal amino acid sequences have been determined and 
aligned with the amino acid sequence deduced from the DNA 
sequence of Mo-MuLV by other workers. The results show 
that: (0 the protease is located at the 5' end of the pol gene, and 
the first four amino acids are overlapped with the 3' end of the 
gag gene; (u) the fifth amino acid residue is glutamine, which 
is inserted by suppression of the UAG termination codon at the 
gag-pol junction; and (ui) the protease is composed of 125 
amino acids with calculated M T = 13,315, and the COOH 
terminus of the protease is adjacent to the NH 2 terminus of 
reverse transcriptase. The map order of the gag-pol gene is 
proposed to be 5'-pl5-pl2-p30-pl(H>rotease-reverse tran- 
scriptase-endonucIease-3 ' . 



The internal structural proteins of murine leukemia virus 
(MuLV) are encoded by the group-specific antigen {gag) 
gene and synthesized as a precursor polyprotein designated 
Prttf* 11 *. In addition to the gag gene, all replication-com- 
petent retroviruses possess a polymerase {pot) and an enve- 
lope (e/iv) gene which have been mapped as 5' -gag-pol-env- 
3'. Although the gag and pol genes are separated by an 
amber termination codon (UAG), translation of the genome- 
size mRNA yields, in addition to Pr65 Ba *, a larger precursor 
designated PrlSO 8 **"** 01 (for review see ref, 1). Jamjoom et al. 
(2) suggested that the synthesis of this gag-pol polyprotein, 
which is made in amounts 4-10% of those for Pr65 gas , may be 
translationally controlled. Using an in vitro translation^ 
system and yeast suppressor tRNA, Philipson et al. (3) 
provided evidence that synthesis of PrlSO****** 1 was en- 
hanced by suppression of an amber termination codon. 

During virus maturation Pro^*** is proteolytically cleaved 
into the final products designated pl5, pl2, p30, and plO. 
The processing is accomplished by a virion-associated 
protease (4), which first cleaves Pr65 8a * into Pr27 gas (pl5 + 
p!2) and PrtO 8 ** (p30 + plO), the two major intermediate 
cleavage products (5, 6). However, the origin of protease 
(viral or cellular) remained unknown. In avian retroviruses a 
protein designated pl5 and encoded by the 3' end of the gag 
gene has been shown to have associated protease activity (7, 
8), but the gag proteins of MuLV were not found to cleave 
the precursor. Genetic studies by Traktman et al, (9) with 
conditional maturation mutants of MuLV have indicated the 
importance of Prl80 8a *" po1 for the proteolytic processing of 
Prttf 8 * 8 . Levin et al. (10), who studied a natural pol 
frame shift mutant, confirmed and extended these observa- 
tions and predicted the map position of a putative viral! y 
coded protease to be 5' to the reverse transcriptase coding 
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region. We have determined the NH 2 -terminal sequence of 
the 80-kilodalton reverse transcriptase (11) derived from 
PrlSO 8 * 8 pol and located its genetic locus to begin 360 
nucleotides downstream from the amber termination codon 
positioned at the end of the gag gene (12). This finding 
suggested to us that the pol gene segment upstream to the 
codon specifying the NH 2 terminus of reverse transcriptase 
may code for an approximately 14-kilodalton additional 
polypeptide, and we hypothesized that it may be the pro- 
tease since the deduced primary and more importantly the 
predicted secondary structure of the putative protein resem- 
bled those of avian myeloblastosis virus and avian sarcoma 
virus protease (13, 14). 

In this communication we report the purification and 
primary structure analysis of a protease from Moloney 
(Mo)-MuLV. These data provide the evidence that the 
protease is encoded by the gag-pol gene and is synthesized 
by a translation al readthrough of the amber termination 
codon for the gag gene. 

MATERIALS AND METHODS 

Viruses. Mo-MuLV was grown in BALB/c mouse bone 
marrow JLS-V9 cells (MJD-54 cells) kindly supplied by K. 
Manly (Roswell Park Memorial Institute, Buffalo, NY). 
Gazdar murine sarcoma virus (Gz-MSV) was grown in 
HTG-2 cells (15). The viruses were purified by sucrose 
density gradient centrifugation and obtained from the Bio- 
logical Products Laboratory, Program Resources, Inc., Na- 
tional Cancer Institute-Frederick Cancer Research Facility 
(Frederick, MD). 

Assay of Protease. Gz-MSV, which itself has no protease 
activity and contains uncleaved Prto 8 * 8 as its major core 
protein, was the source of the polyprotein substrate. Pro- 
tease activity was assayed as previously described (6). 

Extraction of Protease Activity from Virus. To 50 mg of 
purified Mo-MuLV suspended in 2 ml of 0.13 M NaCl/0.01 
M TrisHCI, pH 7.2/0.001 M EDTA (STE buffer), 20 vol of 
cold acetone (-70°C) was added, and then the suspension 
was centrifuged at 5000 rpm for 10 min at 4°C in a Sorvall 
SS-24 rotor. The precipitate was dried under reduced pres- 
sure. To solubilize the protease, extraction of the acetone 
powder (4°C, 30 min with constant stirring) was done 
stepwise first with 10 ml of 0.02 M piperazine-N,W-bis(2- 
ethanesulfonic acid) (Pipes), pH 7.0/5 mM dithiothreitol (PD 
buffer) alone, PD buffer plus 0.1 M NaCl, PD buffer plus 0.5 
M NaCl, and finally PD buffer plus 2.0 M NaCl. Each 
aqueous extract was centrifuged at 10,000 rpm for 10 min at 
4°C in a Sorvall SS-24 (SS-1) rotor. Aliquots were dialyzed 
against PD buffer and assayed for protease activity. Extracts 



Abbreviations: MuLV, murine leukemia virus; Mo-MuLV, 
Moloney MuLV; Gz-MSV, Gazdar murine sarcoma virus; AKV, 
AKR mouse leukemia virus; FeLV, feline leukemia virus; BaEV, 
baboon endogenous virus; RP-HPLC, re versed-phase high-perfor- 
mance liquid chromatography. 
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having protease activity were pooled, concentrated by 
lyophilization, and saved for further purification. 

NaDodSC^/PAGE. Various protein materials were an- 
alyzed by discontinuous NaDodS0 4 /PAGE (16). Specifi- 
cally, to separate low molecular weight proteins, a 32-cm 
&-18% polyacrylamide gradient gel was used as described 
(4). Visualization of proteins was by staining with either 
Coomassie brilliant blue R-250 or silver (17). 

Ion-Exchange and Gel-Permeation Chromatography. Frac- 
tionation of protease by phosphocellulose and Sephadex 
G-75 chromatography was done as previously described (4). 

Reversed-Phase High-Performance Liquid Chromatogra- 
phy (RP-HPLC). Lyophilized samples shown to have 
protease activity were dissolved in saturated guanidine*HCl 
(GdnHCl), and further fractionated by RP-HPLC (18) on a 
piBondapak Q\% column (Waters Associates). Five -milliliter 
fractions were collected and aliquots were taken for protein 
composition analysis on NaDodS0 4 /PAGE and protease 
activity measurement. 

NH 2 -Terminal Microsequence Analysis. Semi-automated 
microsequence analysis was performed with a Beckman 
sequencer model 890C equipped with a cold trap accessory 
as described (19). Phenylthiohydantoin derivatives of amino 
acids were identified and quantitated by HPLC (20). 

COOH-Terminal Sequence Analysis. Protein samples were 
digested with carboxypeptidase Y (21) for various time 
intervals and the released amino acids were quantitated on 
the Dumim 500 analyzer. 

RESULTS 

Purification of Protease. In initial studies designed to 
purify the Mo-MuLV protease, the proteolytic activity was 
first concentrated and fractionated by phosphocellulose 
chromatography using stepwise elution with increasing con- 
centration of KC1. The activity that eluted at 0.3 M KCI was 
further fractionated by gel filtration on Sephadex G-75. Each 
fraction was assayed for protease activity by incubating 
aliquots with disrupted Gz-MSV and subsequently determin- 
ing the protein pattern by NaDodSCVPAGE. Shown in Fig. 
1 are the electrophoretic profiles of Sephadex G-75 fractions 
20-23 after they were incubated alone or with the substrate 
(Gz-MuLV Pr65 Bag ). These four fractions were found to have 
protease activity as judged by the decrease in band intensity 
of Gz-MSV P^ 8 ** and concomitant appearance of Pr40 ga8 , 
Pr27 Raa , p30, and plO bands, which were readily detectable. 
It is also seen by this semiquantitative assay that fractions 21 
and 22 had the peak activity inasmuch as Pr40 8 * B , the 
proximal precursor for p30 and plO (4), completely dis- 
appeared and was further cleaved into the final products p30 
and plO. It was observed previously as well as in the present 
studies that, in contrast to PT40 828 , the intermediate cleavage 
product, Pr27 gas , is relatively difficult to process in vitro to 
the constituent pl5 and pl2. 

Attempts to purify the protease to homogeneity by con- 
ventional methods as described were unsuccessful. Al- 
though the protease could be concentrated considerably (in 
some cases 500-fold on protein basis), it was difficult to 
identify which protein was responsible for proteolytic activ- 
ity. The protein patterns of concentrated phosphocellu- 
lose/Sephadex G-75 fractions made visible by staining after 
NaDodS04/PAGE (see Fig. 1) were very complex. More 
than 10 proteins were detected in the 10- to 20-kilodalton 
region of NaDodS0 4 /PAGE, where the protease itself mi- 
grates (4). 

To purify the protease in sufficient amounts and purity for 
structure analysis we utilized RP-HPLC. For these studies 
we first prepared an acetone powder from purified virus. 
From this powder we solubilized the protease by stepwise 
extraction with PD buffer having increasing concentrations 
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Fig. 1. Cleavage of Gz-MSV Pr65«*« with partially purified 
protease of Mo-MuLV. Purified disrupted virus (50 mg) was frac- 
tionated by a combination of phosphocellulose and Sephadex G-75 
chromatography, and the protease activity of each fraction was 
determined as described in the text. Protein composition and 
protease activity of Sephadex G-75 fractions 20-23 are shown: - , 
fraction alone; + , fraction plus Gz-MSV PrfS***. The gel was 
stained with silver nitrate. 



of NaCI. Much of the protease activity was extracted with 
0-0.5 M NaCI in PD buffer, while most of the membrane 
proteins, including pl5E and pl5, stayed in the insoluble 
residue. The protease-active extracts were pooled, lyophil- 
ized, and dissolved in 3 ml of PD buffer, then fractionated on 
a Sephacryi S200 column in the cold and further purified by 
RP-HPLC using a /iBondapak Cig column as shown in Fig. 
1A. Fractions (5 ml) were collected and lyophilized to 
recover proteins. Purity and protease activity were deter- 
mined by NaDodS0 4 /PAGE analysis as shown in Fig. 2 B 
and C, respectively. The peak activity was eluted at about 
33% acetonitrile (fraction 24 of Fig. 2) and clearly separated 
from p30, pl2, plO, and other low molecular weight proteins. 
The purified protein showed a single band in NaDod- 
SCVPAGE (Fig. IB). When incubated with disrupted Gz- 
MSV it cleaved PriS 83 * to produce Pr40 ga « t Pr27 ga « > and p30 
(Fig. 2Q. The total protein recovered in RP-HPLC fractions 
22-24 was 14 fig. 

In the absence of a quantitative assay for the protease the 
determination of the recovery of enzymatic activity is dif- 
ficult. If we define a unit as the activity (per unit volume) 
capable of 50% reduction of PrtS 82 * band intensity after 
16-hr incubation (see Materials and Methods), we can esti- 
mate that we extracted a total of 110 units of activity from 
the virus and found 66 units in HPLC fractions (Fig. 2). This 
corresponds to 60% overall recovery of protease activity. 
The possibility for the actual protease being a minor com- 
ponent copurifying with the protein peak cannot be com- 
pletely excluded. However, this is unlikely since in 
NIH/3T3 cells transfected with cloned viral DNA having 
deletions only in the protease region, PrtiS 8 ** is synthesized 
but not processed into mature protein components (un- 
published observations, and S. Crawford and S. P. Goff, 
personal communication). 

MI 2 -terminal Amino Acid Sequence of the Protease. To 
determine the NH r terminaJ amino acid sequence of purified 
protease recovered from fraction 24, 0.5 nmol of protein was 
degraded in a single microsequence analysis. The amino 
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Fig. 2. Purification of protease by RP-HPLC. Purified virus (50 
mg) was extracted with acetone. Protease activity extracted from 
acetone powder was further fractionated by Sephacryl S200 and 
then by RP-HPLC (/iBondapak C u column 0.39 x 30 cm, Waters 
Associates). {A) Absorbance profile. Gradient conditions were 
0-20% (vol/vol) acetonitrile over 20 min; 20-30% acetonitrile over 
40 min; isocralic at 30% acetonitrile for 30 min, and 30-60% 



5 10 
Thr-leu-Asp-Asp-Gln-61y-Gly-Gln- X -Gln- 
240 575 300 490 240 140 220 230 195 

15 20 
Glu-Pro-Pro-Pro-Glu- X -Arg-lU-Thr-Leu- 
180 65 90 85 110 135 50 50 70 

Fig. 3. NH r terminal sequence of Mo-MuLV protease. The 
number under each assigned residue is the recovery (in pmol) of that 
residue. X, an unidentified residue. 



acids identified at each cycle (first 20) are shown in Fig. 3 
together with the quantitative yields. 

COOH-Terminal Sequence Analysis. Purified protease was 
digested for various time periods with carboxypeptidase Y 
and the amino acids released were determined on the ana- 
lyzer. The data shown in Table 1 allow us to conclude that 
the COOH terminus of the protease is leucine. Other amino 
acids released in smaller quantities were valine, glutamine 
(or serine), and proline. The kinetic analysis data do not by 
themselves define an accurate sequence, but they could be 
interpreted by comparison with the amino acid sequence 
deduced from the DNA sequence (12) as will be discussed 
below. 

Position of Protease on Viral Genome and Suppression of 
Amber Codon into Glutamine. To determine whether the 
protease protein is virus encoded or not, we aligned the 
NH 2 - and COOH-terminal amino acid sequences with nucle- 
otide sequences. As shown in Fig. 4, the protease amino acid 
sequence starts with threonine encoded by triplet 2223-2225 
and includes the last four amino acids of the gag region. 
Furthermore, the amber codon (UAG) is translated as 
glutamine, which is residue five of the protease. This is 
foDowed by a glycine residue encoded by the first triplet of 
the pol gene, indicating that translation continues in the 
same reading frame. These results clearly show that the 
PrtiS^-specific protease is a virus-encoded enzyme and that 
it is synthesized by reading through the termination codon. 

It is also seen that the determined NH 2 -tenninal sequence 
for the protease (15 residues are shown) matches the amino 
acid sequence predicted from the nucleotide sequence of 
proviral DNA designated pMLV-1 (12) except at position 11, 
where the protein has Glu instead of Asp. At this juncture it 
is important to point out that the pMLV-1 clone is not 
infectious and that the infectious clone (pMLV-48) of Miller 
and Verma (22), like the protein, also has Glu in position 11. 
A single base change, C — > G, in the codon accounts for this 
difference. A comparison of the COOH-terminal sequence 
analysis results with the translated sequence of proviral 
DNA indicates that the COOH-terminal Leu must be the 
codon 2595-2597, which is adjacent to the NH 2 -terminal Thr 
of reverse transcriptase (11). This determines that the COOH- 
terminal sequence is Pro-Leu-Gln-Val-Leu-OH and that the 
protease is composed of 125 amino acids (Figs. 4 and 5). 

DISCUSSION 

We have succeeded in purifying from Mo-MuLV a virus- 
encoded protease that is capable of processing in vitro the 
gag precursor polyprotein Prttf"** into the constituent struc- 

acetonitrile over 90 min at a constant flow rate of 1.0 ml/min. (B) 
Purity of RP-HPLC-separated proteins by NaDodS0 4 /PAGE. One- 
twentieth of each fraction was lyophilized and analyzed; staining 
was with silver nitrate. Lane Mk contained molecular weight 
markers: phosphorylase b, 92,000; bovine serum albumin, 68,000; 
ovalbumin, 46,000; carbonic anhydrase, 29,000; lysozyme, 14,400. 
(Q Assay of RP-HPLC fractions for protease activity. One- 
twentieth of each fraction was lyophilized and assayed for protease 
activity as described in text. Proteins are visualized by silver 
staining. Lanes Mk as in B. 
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Table 1. COOH-terminal amino acid sequence analysis of 
Mo-MuLV protease 

Amino acids released,* 



Digestion time, 
min 




pmol 






Leu 


Val 


Gin 


Pro 


0.5 


70 


29 


0 


0 


1.0 


101 


30 


33 


0 


2.0 


123 


31 


45 


30 


5.0 


140 


48 


49 


31 


10.0 


158 


67 


55 


48 


20.0 


191 


72 


62 


60 



*Each sample analyzed at each time point had 200 pmol of 
protein. 



tural proteins, which appear to be the same as those occur- 
ring in mature virions. We have definitely identified cleavage 
products p30, plO, and pl5. The identification of pl5 was 
aided by the high sensitivity of detection after its specific 
labeling with f 3 H]myristate (unpublished data). The fourth 
gag-gene encoded protein, pl2, however, could not be 
identified with absolute certainty in our assay system due to 
its poor affinity for Coorhassie blue and to the apparent lack 
of specificity of the more sensitive silver stain (nucleic acids 
are stained equally well as proteins). 

The NH 2 - and COOH-terminal sequences of the Mo- 
MuLV protease as determined in this study and the avail- 
ability of DNA sequences now present an opportunity for 
deducing the complete primary structure of this virus- 
encoded proteolytic enzyme. The complete amino acid se- 
quence of the Mo-MuLV protease aligns (Fig. 5) without 
gaps with the amino acid sequences of putative proteases as 
inferred from DNA sequences of AKR mouse leukemia vims 
(AKV) (23), feline leukemia virus (FeLV) (24), and baboon 
endogenous virus (BaEV) (25). It is seen from this alignment 
that there are only four amino acid differences between the 
two mouse proteases compared. With respect to the Mo- 
MuLV sequence, FeLV and BaEV proteases have 25 (20%) 
and 39 (31.2%) changes, respectively, indicating highly con- 
served primary structures. From the combined results it 
appears that there are three variable regions: one at the 
NH 2 -terminal region (residues 1-12), the second in the 
middle part of the molecule (residues 60-77), and the third at 
the COOH-terminal region (residues 108-120). The last five 
residues of the MuLV and FeLV sequences are identical, 
while in the BaEV sequence there are two substitutions, Leu 
— ► lie and Val — ► He. These changes, however, would not 
significantly alter the nature of the cleavage site between the 
protease and reverse transcriptase. The regions involving 
residues 29-41 are identical among the sequences, and 
another highly conserved long stretch is present, extending 
from residue 78 to residue 101. Little is known about the 



active site of the retroviral protease, but inhibition studies 
done with avian protease (8) suggested that cysteine may be 
involved. Furthermore, it was also shown that the mouse 
protease is inhibited by tosyllysyl chloromethyl ketone (TL- 
CK) (4). It is known that TLCK can inhibit thiol proteases or 
similar enzymes just as well as serine proteases, and on 
occasions it has been effectively used to identify cysteines at 
the active site (26). Interestingly, with the exception of 
BaEV, each of the viral proteases, including those of avian 
myeloblastosis virus and avian sarcoma vims, has only a 
single cysteine (residue 88 in the alignment of Fig. 5), which 
is preceded by Asp or Glu. It will be important to develop 
quantitative assays, perhaps utilizing synthetic peptide sub- 
strates (27), for the retroviral proteases to characterize them 
more completely. 

Our results, together with the previously determined NH r 
and COOH-terminal sequences of reverse transcriptase (11) 
and their alignments with nucleotide sequences, suggest the 
map order for Prl80 Ba *" po1 of the mouse retrovirus to be 
5'-pl5-pl2-p30-plO-protease-reverse transcriptase-endo- 
nuclease-3'. The polyprotein itself most likely has no 
proteolytic activity. It remains to be seen how the active 
protease is generated. Autocatalysis or an initial cleavage by 
another enzyme (probably cellular) may be responsible. 

The most significant result reported in this study relevant 
to vims replication is the finding that in vivo translation of 
the pol gene resulting in the synthesis of the precursor 
polyprotein PrlSO 8 ** -1 * 01 occurs through in-frame readthrough 
of the amber termination codon. While we do not know the 
exact mechanism by which glutamine is inserted at the 
termination site in the translation process taking place in 
mouse fibroblasts, we can assume that this insertion is 
accomplished via the misreading of the UAG codon by 
normal tRNA GIn due to the wobble in the 3' position of the 
anticodon. A more remote possibility is suppression by a 
specific nonsense suppressor tRNA. Such tRNAs have been 
identified not only in prokaryotes but also recently in 
eukaryotes (28, 29). Suppression of termination cod on s has 
been proposed to occur in plant viruses (30-32) and 
alphaviruses (33, 34). It appears that plant and animal 
viruses are capable of effectively utilizing the translational 
readthrough mechanism to produce from a single initiation 
site different amounts of proteins and poly proteins required 
for specific functions. The importance of this translational 
control for vims replication, infectivity, and pathogenicity 
could be directly tested by utilizing mutants in which the 
respective termination codons are eliminated. 

In Mo-MuLV the gag and pol genes are in the same 
reading frame. However, available nucleic acid sequences 
indicate that apparently this is not tme for Rous sarcoma 
vims (14), FeLV (24), human T-cell leukemia vims (35), and 
bovine leukemia vims (36). As with the Mo-MuLV study, 
protein sequencing will reveal whether in-frame suppres- 



C-tenninus of p10 M-terrainus of RT 

Leu Leu Thr Leu Asp Asp *** Gly Gly Gin 61y Gin Asp Pro Pro Pro Glu Pro Leu Gin Val Leu Thr Leu Asn He 

^ gag ♦ ♦ pol 

pMLV-1* CTC CTG ACC CTA CAT €AC TAG GGA GGT CAG GGT CAG GAC CCC CCC CCT GAA CCC C7G CAA GTG TTG ACC CTA AAT ATA 

pHLV-48** • G 

2223 gl„ 2597 

Determined Thr Leu Asp Asp Gl_n Gly Gly Gin X Gin Glu Pro Pro Pro Glu (Pro Leu Gin Val)Leu t Thr Leu Asn He 

PROTEASE | 

Fig. 4. Alignment of NH 2 - and COOH-terminal amino acid sequences with DNA sequences of pMLV-1 (12) and pMLV-48 (21). The amber 
codon UAG is translated as glutamine (double underline). RT, reverse transcriptase. 



1622 Biochemistry: Yoshinaka et al. 



Proc. NatL Acad. Sci USA 82 {1985) 



10 20 30 

I I I 

Ho-HuLV ThrLeuAspAspGl nGly GlyGl nGlyGl nGl uProProPro61 uProArg 1 \ eThrLeuLy sV*l GlyGlyGl nProVaUhrPheLeu 

AKV .... X Thr 

FeLV Asn . Glu . X GluSer ... Asp Arglle 

BaEV X . Cys . . SerGlyAla Leu . . Ser . . . His . ThrThr . , 

40 50 60 

I I I 

Mo-MuLV VaUspThrGlyAUGlnHisSerValLeuThrGlnAsnProGlyProLeuSerAspLysSerAlaTrpValGlnGlyAlaThrGlyGly 

AKV Arg ... . 

FeLV ArgProAsp ArgThr . Leu Ser 

BaEV LysAlaAsn .... SerArgThrSer Arg 

70 80 90 

I 



Mo-MuLV LysArgTyrArgTrpThrThrAspArgLysValHIsLeuAlaThrGlyLysValThrH 
AKV 



sSerPheLeuHisValProAspCysProTyr 



FeLV . Asn Arg . Gin Tyr . . Glu . . . 

BaEV . HetHislys . . AsnArg . Thr . Asn . Gly61n . Ket Val . . Glu . . . 

100 110 120 

I ■ » * I 1 

Mo-HuLV ProLeuLeuGlyArgAspLeuLeuThrLysLeuLysAUGlnlleHlsPheGluGlySerGlyAlaGlnValMetGlyProMetGlyGlnProLeuGlnValLeu 

AKV Val . . Lys 

FeLV . . Thr . Glu . . Asn . Val . . Arg . Leu ; . 

BaEV Gly SerGluAla .... LeuAspArgAsp . . . lie . He . 

• 

Fig. 5. Amino acid sequence alignment of Mo-MuLV protease with corresponding sequences of AKR mouse leukemia vims (AKV) (23), 
feline leukemia virus (FeLV) (24), and baboon endogenous virus (BaEV) (25) inferred from nucleotide sequences. Corresponding to position 
5 (Gin in Mo-MuLV and X in the other sequences) the genes have an amber termination codon. In this alignment the predicted NH r terminaJ 
sequence (positions 1—12) of the putative FeLV protease is translated from the nucleotide sequence of Laprevotte et al. (24) in the gag reading 
frame. The remainder of the sequence, starting with Pro in position 12, is in a reading frame as published (24). The DN A sequence between 
codbn GAC (Asp in position 11) and CCC (Pro in position 12) has an additional C which was not decoded for the purjwses of this alignment. 
It remains to be seen whether the FeLV suppression occurs in frame, as in MuLV, or requires a frameshift. The third possibility of course is 
splicing, as proposed by Laprevotte et al. (24). 



sion, frameshift suppression, or splicing is involved in these 
latter cases. In all cases the elucidation of actual mechanism 
responsible for suppression will require further biochemical 
experiments with purified tRNAs. 
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